8-oxyguanine DNA glycosylase polypeptides

ABSTRACT

The invention provides isolated polynucleotides encoding polypeptides having 8-oxoguanine DNA glycosylase (OGG) activity. The present invention provides methods and compositions relating to expressing OGG in plants in order to improve transformation efficiency, homologous recombination and/or targeted gene modifications. The invention further provides recombinant expression cassettes, host cells, transgenic plants, and antibody compositions.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional of co-pending U.S. applicationSer. No. 09/755,508 filed Jan. 5, 2001, and claims the benefit of U.S.application Ser. No. 60/174,681 filed Jan. 6, 2000, all of which areherein incorporated in entirety by reference.

TECHNICAL FIELD

[0002] The present invention relates generally to plant molecularbiology. More specifically, it relates to nucleic acids and methods formodulating their expression in plants.

BACKGROUND OF THE INVENTION

[0003] A variety of environmental agents such as gamma radiation, UVlight in the 320-380 nm range, ozone, heat, and various chemicals causeoxidative damage to cellular DNA. Similarly, reactive oxygen species,hydroxyl radicals and superoxide and nitric oxide species generated invivo cause oxidative damage to DNA (Friedberg, E. et al., in DNA repairand Mutagenesis, American Society of Microbiology Press, WashingtonD.C., pages 14-19, 1995). The precise nature of DNA modification variesdepending upon the exposure and type of causative reagent. Suchmodifications as breakage of the phosphodiester bond have been reported,as well as oxidative stress induced illegitimate recombination inbacteria (Ouchane, S. et al., EMBO J. 16:4777-4787, 1997). However, themost common result of oxidative damage is the oxidation of bases andsugars. Formamidopyrimidine (Fapy), 8-hydroxyguanine and 8-oxo-7,8dihydrodeoxyguanosine are the most commonly observed base modificationsfollowing oxidative damage. Of these, 8-hydroxyguanine is consideredhighly mutagenic. It causes G:C to A:T transversions because8-hydroxyguanine can pair with adenine and cytosine nucleotides withalmost equal efficiencies during DNA replication (Shibutani, A. et al.,Nature 349:431-434,1991; Maki, H. and Sekiguchi M., Nature 355:273-275,1992).

[0004] Consequently, all living organisms have developed specificenzymatic pathways to remove such lesions and to maintain genomicstability. These enzymatic pathways have been very well characterized inbacteria and lower eukaryotes such as yeast. Implications of theinvolvement of oxidative DNA damage in the development of malignancieshave also prompted a detailed analysis of these pathways in mammaliansystems such as humans. These pathways have not been well studiedhowever, in plants such as maize.

[0005] In E. coli, three genes labeled mutM, mutY, and mutT encode theenzymes responsible for the removal of Fapy and 8-hydroxyguaninelesions. Their gene products are members of the DNA glycosylase family.The mutY gene product specifically removes the unmodified A from the8-hydroxyguanosine: A pair. The mutT gene product, on the other hand,preferentially hydrolyzes 8-oxo-7,8 dihydrodeoxyguanosine therebypreventing its incorporation in DNA. E. coli mutants of these genes showa mutator phenotype with a 10-1000 fold increase in transversionscompared to wild type. In addition to the mutator phenotype, E. colimutM mutants show increased illegitimate or non-homologousrecombination. Furthermore, the mutM gene product suppresses thisillegitimate recombination (Onda, M. et al., Genetics 151:439446, 1999).Thus, overexpression of the mutM gene product may be used as a tool tosuppress mutations in general and oxidative stress inducednon-homologous recombination in particular.

[0006] Recent studies have revealed the presence of mutM orthologues inyeast, human, and Arabidopsis thaliana (van der Kemp, P A et al., PNAS93:5197-5202, 1996; Arai, K. et al., Oncogene 14:2857-2861, 1997;Radicella, J P et al., PNAS 94:8010-8015, 1997; Ohtsubo, T. et al., Mol.Gen. Genet. 259:577-590, 1998). The present invention presents afull-length cDNA encoding a maize orthologue of E. coli mutM. Unlike theanimal mutM orthologues, the maize enzyme contains a C-terminal regionof alternating acidic and basic amino acid residues and a putativenuclear localization signal as shown in Example 4. The mutM orthologueof the present invention may be useful as a suppresser of DNA mutationswhich are induced by oxidative damage. Furthermore, it may be used toreduce illegitimate recombination thereby increasing frequencies ofhomologous recombination and transformation. Control of these processeshas important implications in the creation of novel recombinantlyengineered crops such as maize. The present invention provides for theseand other advantages.

SUMMARY OF THE INVENTION

[0007] Generally, it is the object of the present invention to providenucleic acids and proteins relating to maize mutM. It is an object ofthe present invention to provide expression cassettes, host cells andtransgenic plants comprising the nucleic acids of the present invention,and methods for modulating, in a transgenic plant, the expression of thenucleic acids of the present invention in order to improve theefficiency of homologous recombination, transformation efficiency or toinduce targeted gene changes. It is also an object of the presentinvention to provide antibody compositions for detecting thepolypeptides of the present invention.

[0008] In other aspects the present invention relates to: 1) recombinantexpression cassettes, comprising a nucleic acid of the present inventionoperably linked to a promoter, 2) a non-human host cell into which hasbeen introduced the recombinant expression cassette, and 3) a transgenicplant comprising the recombinant expression cassette.

[0009] Definitions

[0010] Units, prefixes, and symbols may be denoted in their SI acceptedform. Unless otherwise indicated, nucleic acids are written left toright in 5′ to 3′ orientation; amino acid sequences are written left toright in amino to carboxy orientation, respectively. Numeric rangesrecited within the specification are inclusive of the numbers definingthe range and include each integer within the defined range. Amino acidsmay be referred to herein by either their commonly known three lettersymbols or by the one-letter symbols recommended by the IUPAC-IUBBiochemical Nomenclature Commission. Nucleotides, likewise, may bereferred to by their commonly accepted single-letter codes. Unlessotherwise provided for, software, electrical, and electronics terms asused herein are as defined in The New IEEE Standard Dictionary ofElectrical and Electronics Terms (5^(th) edition, 1993). The termsdefined below are more fully defined by reference to the specificationas a whole.

[0011] By “amplified” is meant the construction of multiple copies of anucleic acid sequence or multiple copies complementary to the nucleicacid sequence using at least one of the nucleic acid sequences as atemplate. Amplification systems include the polymerase chain reaction(PCR) system, ligase chain reaction (LCR) system, nucleic acid sequencebased amplification (NASBA, Cangene, Mississauga, Ontario), Q-BetaReplicase systems, transcription-based amplification system (TAS), andstrand displacement amplification (SDA). See, e.g., Diagnostic MolecularMicrobiology: Principles and Applications, D. H. Persing et al., Ed.,American Society for Microbiology, Washington, D.C. (1993). The productof amplification is termed an amplicon.

[0012] As used herein, “antisense orientation” includes reference to aduplex polynucleotide sequence that is operably linked to a promoter inan orientation where the antisense strand is transcribed. The antisensestrand is sufficiently complementary to an endogenous transcriptionproduct such that translation of the endogenous transcription product isoften inhibited.

[0013] By “encoding” or “encoded”, with respect to a specified nucleicacid, is meant comprising the information for translation into thespecified protein. A nucleic acid encoding a protein may comprisenon-translated sequences (e.g., introns) within translated regions ofthe nucleic acid, or may lack such intervening non-translated sequences(e.g., as in cDNA). The information by which a protein is encoded isspecified by the use of codons. Typically, the amino acid sequence isencoded by the nucleic acid using the “universal” genetic code. However,variants of the universal code, such as are present in some plant,animal, and fungal mitochondria, the bacterium Mycoplasma capricolum, orthe ciliate Macronucleus, may be used when the nucleic acid is expressedtherein.

[0014] When the nucleic acid is prepared or altered synthetically,advantage can be taken of known codon preferences of the intended hostwhere the nucleic acid is to be expressed. For example, although nucleicacid sequences of the present invention may be expressed in bothmonocotyledonous and dicotyledonous plant species, sequences can bemodified to account for the specific codon preferences and GC contentpreferences of monocotyledons or dicotyledons as these preferences havebeen shown to differ (Murray et al., Nucl. Acids Res. 17:477-498(1989)). Thus, the maize preferred codon for a particular amino acid maybe derived from known gene sequences from maize. Maize codon usage for28 genes from maize plants is listed in Table 4 of Murray et al., supra.

[0015] As used herein “full-length sequence” in reference to a specifiedpolynucleotide or its encoded protein means having the entire amino acidsequence of, a native (non-synthetic), endogenous, biologically activeform of the specified protein. Methods to determine whether a sequenceis full-length are well known in the art including such exemplarytechniques as northern or western blots, primer extension, S1protection, and ribonuclease protection. See, e.g., Plant MolecularBiology: A Laboratory Manual, Clark, Ed., Springer-Verlag, Berlin(1997). Comparison to known full-length homologous (orthologous and/orparalogous) sequences can also be used to identify full-length sequencesof the present invention. Additionally, consensus sequences typicallypresent at the 5′ and 3′ untranslated regions of mRNA aid in theidentification of a polynucleotide as full-length. For example, theconsensus sequence ANNNNAUGG, where the underlined codon represents theN-terminal methionine, aids in determining whether the polynucleotidehas a complete 5′ end. Consensus sequences at the 3′ end, such aspolyadenylation sequences, aid in determining whether the polynucleotidehas a complete 3′ end.

[0016] As used herein, “heterologous” in reference to a nucleic acid isa nucleic acid that originates from a foreign species, or, if from thesame species, is substantially modified from its native form incomposition and/or genomic locus by deliberate human intervention. Forexample, a promoter operably linked to a heterologous structural gene isfrom a species different from that from which the structural gene wasderived, or, if from the same species, one or both are substantiallymodified from their original form. A heterologous protein may originatefrom a foreign species or, if from the same species, is substantiallymodified from its original form by deliberate human intervention.

[0017] By “host cell” is meant a cell which contains a vector andsupports the replication and/or expression of the vector. Host cells maybe prokaryotic cells such as E. Coli, or eukaryotic cells such as yeast,insect, amphibian, or mammalian cells. Host cells can bemonocotyledonous or dicotyledonous plant cells. An example of amonocotyledonous host cell is a maize host cell.

[0018] The term “introduced” in the context of inserting a nucleic acidinto a cell, means “transfection” or “transformation” or “transduction”and includes reference to the incorporation of a nucleic acid into aeukaryotic or prokaryotic cell where the nucleic acid may beincorporated into the genome of the cell (e.g., chromosome, plasmid,plastid or mitochondrial DNA), converted into an autonomous replicon, ortransiently expressed (e.g., transfected mRNA).

[0019] As used herein “Transformation” includes stable transformationand transient transformation unless indicated otherwise.

[0020] As used herein “Stable Transformation” refers to the transfer ofa nucleic acid fragment into a genome of a host organism (this includesboth nuclear and organelle genomes) resulting in genetically stableinheritance. In addition to traditional methods, stable transformationincludes the alteration of gene expression by any means includingchimeraplasty or transposon insertion.

[0021] As used herein “Transient Transformation” refers to the transferof a nucleic acid fragment or protein into the nucleus (orDNA-containing organelle) of a host organism resulting in geneexpression without integration and stable inheritance.

[0022] The terms “isolated” refers to material, such as a nucleic acidor a protein, which is: (1) substantially or essentially free fromcomponents that normally accompany or interact with it as found in itsnaturally occurring environment. The isolated material optionallycomprises material not found with the material in its naturalenvironment; or (2) if the material is in its natural environment, thematerial has been synthetically (non-naturally) altered by deliberatehuman intervention to a composition and/or placed at a location in thecell (e.g., genome or subcellular organelle) not native to a materialfound in that environment. The alteration to yield the syntheticmaterial can be performed on the material within or removed from itsnatural state. For example, a naturally occurring nucleic acid becomesan isolated nucleic acid if it is altered, or if it is transcribed fromDNA which has been altered, by means of human intervention performedwithin the cell from which it originates. See, e.g., Compounds andMethods for Site Directed Mutagenesis in Eukaryotic Cells, Kmiec, U.S.Pat. No. 5,565,350; In Vivo Homologous Sequence Targeting in EukaryoticCells; Zarling et al., PCT/US93/03868. Likewise, a naturally occurringnucleic acid (e.g., a promoter) becomes isolated if it is introduced bynon-naturally occurring means to a locus of the genome not native tothat nucleic acid. Nucleic acids which are “isolated” as defined herein,are also referred to as “heterologous” nucleic acids.

[0023] Unless otherwise stated, the term “maize mutM nucleic acid” is anucleic acid of the present invention and means a nucleic acidcomprising a polynucleotide of the present invention (a “maize mutMpolynucleotide”) encoding a maize mutM polypeptide. A “maize mutM gene”is a gene of the present invention and refers to a heterologous genomicform of a full-length maize mutM polynucleotide.

[0024] As used herein, “nucleic acid” includes reference to adeoxyribonucleotide or ribonucleotide polymer, or chimeras thereof, ineither single- or multi-stranded form, and unless otherwise limited,encompasses known analogues having the essential nature of naturalnucleotides in that they hybridize to single-stranded nucleic acids in amanner similar to naturally occurring nucleotides (e.g., peptide nucleicacids). As used herein, “nucleic acid” and “polynucleotide” are usedinterchangably. A polynucleotide can be full-length or a subsequence ofa native or heterologous structural or regulatory gene. Unless otherwiseindicated, the term includes reference to the specified sequence as wellas the complementary sequence thereof. Thus, DNAs or RNAs with backbonesmodified for stability or for other reasons are “polynucleotides” asthat term is intended herein. Moreover, DNAs or RNAs comprising unusualbases, such as inosine, or modified bases, such as tritylated bases, toname just two examples, are polynucleotides as the term is used herein.It will be appreciated that a great variety of modifications have beenmade to DNA and RNA that serve many useful purposes known to those ofskill in the art. The term polynucleotide as it is employed hereinembraces such chemically, enzymatically or metabolically modified formsof polynucleotides, as well as the chemical forms of DNA and RNAcharacteristic of viruses and cells, including among other things,simple and complex cells.

[0025] By “nucleic acid library” is meant a collection of isolated DNAor RNA molecules which comprise and substantially represent the entiretranscribed fraction of a genome of a specified organism or of a tissuefrom that organism. Construction of exemplary nucleic acid libraries,such as genomic and cDNA libraries, is taught in standard molecularbiology references such as Berger and Kimmel, Guide to Molecular CloningTechniques, Methods in Enzymology Vol. 152, Academic Press, Inc., SanDiego, Calif. (Berger); Sambrook et al., Molecular Cloning—A LaboratoryManual, 2nd ed., Vol. 1-3 (1989); and Current Protocols in MolecularBiology, F. M. Ausubel et al., Eds., Current Protocols, a joint venturebetween Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.(1994).

[0026] As used herein “operably linked” includes reference to afunctional linkage between a promoter and a second sequence, wherein thepromoter sequence initiates and mediates transcription of the DNAsequence corresponding to the second sequence. Generally, operablylinked means that the nucleic acid sequences being linked are contiguousand, where necessary to join two protein coding regions, contiguous andin the same reading frame.

[0027] As used herein, the term “plant” includes reference to wholeplants, plant organs (e.g., leaves, stems, roots, etc.), seeds and plantcells and progeny of same. Plant cell, as used herein includes, withoutlimitation, seeds, suspension cultures, embryos, meristematic regions,callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen,and microspores. The class of plants which can be used in the methods ofthe invention include both monocotyledonous and dicotyledonous plants.

[0028] The terms “polypeptide”, “peptide” and “protein” are usedinterchangeably herein to refer to a polymer of amino acid residues. Theterms apply to amino acid polymers in which one or more amino acidresidue is an artificial chemical analogue of a corresponding naturallyoccurring amino acid, as well as to naturally occurring amino acidpolymers. The essential nature of such analogues of naturally occurringamino acids is that, when incorporated into a protein, that protein isspecifically reactive to antibodies elicited to the same protein butconsisting entirely of naturally occurring amino acids. The terms“polypeptide”, “peptide” and “protein” are also inclusive ofmodifications including, but not limited to, glycosylation, lipidattachment, sulfation, gamma-carboxylation of glutamic acid residues,hydroxylation and ADP-ribosylation. Further, this invention contemplatesthe use of both the methionine-containing and the methionine-less aminoterminal variants of the protein of the invention.

[0029] As used herein “promoter” includes reference to a region of DNAupstream from the start of transcription and involved in recognition andbinding of RNA polymerase and other proteins to initiate transcription.A “plant promoter” is a promoter capable of initiating transcription inplant cells whether or not its origin is a plant cell. Exemplary plantpromoters include, but are not limited to, those that are obtained fromplants, plant viruses, and bacteria which comprise genes expressed inplant cells such Agrobacterium or Rhizobium. Examples of promoters underdevelopmental control include promoters that preferentially initiatetranscription in certain tissues, such as leaves, roots, or seeds. Suchpromoters are referred to as “tissue preferred”. Promoters whichinitiate transcription only in certain tissue are referred to as “tissuespecific”. A “cell type” specific promoter primarily drives expressionin certain cell types in one or more organs, for example, vascular cellsin roots or leaves. An “inducible” or “repressible” promoter is apromoter which is under environmental control. Examples of environmentalconditions that may effect transcription by inducible promoters includeanaerobic conditions or the presence of light. Tissue specific, tissuepreferred, cell type specific, and inducible promoters constitute theclass of “non-constitutive” promoters. A “constitutive” promoter is apromoter which is active under most environmental conditions.

[0030] The term “maize mutM polypeptide” is a polypeptide of the presentinvention in the DNA glycosylase family and refers to one or more aminoacid sequences, in glycosylated or non-glycosylated form. MutMpolypeptides are enzymes that can repair specific oxidative damage toDNA by removing 7,8-dihyrdro-8-oxoguanine (8-oxoG) bases from DNA. Theterm is also inclusive of fragments, variants, homologs, alleles orprecursors (e.g., preproproteins or proproteins) thereof. A “maize mutMprotein” is a protein of the present invention and comprises a maizemutM polypeptide.

[0031] As used herein “recombinant” includes reference to a cell orvector, that has been modified by the introduction of a heterologousnucleic acid or that the cell is derived from a cell so modified. Thus,for example, recombinant cells express genes that are not found inidentical form within the native (non-recombinant) form of the cell orexpress native genes that are otherwise abnormally expressed,under-expressed or not expressed at all as a result of deliberate humanintervention. The term “recombinant” as used herein does not encompassthe alteration of the cell or vector by naturally occurring events(e.g., spontaneous mutation, naturaltransformation/transduction/transposition) such as those occurringwithout deliberate human intervention.

[0032] As used herein, a “recombinant expression cassette” is a nucleicacid construct, generated recombinantly or synthetically, with a seriesof specified nucleic acid elements which permit transcription of aparticular nucleic acid in a host cell. The recombinant expressioncassette can be incorporated into a plasmid, chromosome, mitochondrialDNA, plastid DNA, virus, or nucleic acid fragment. Typically, therecombinant expression cassette portion of an expression vectorincludes, among other sequences, a nucleic acid to be transcribed, and apromoter.

[0033] The term “residue” or “amino acid residue” or “amino acid” areused interchangeably herein to refer to an amino acid that isincorporated into a protein, polypeptide, or peptide (collectively“protein”). The amino acid may be a naturally occurring amino acid and,unless otherwise limited, may encompass non-natural analogs of naturalamino acids that can function in a similar manner as naturally occurringamino acids.

[0034] The term “selectively hybridizes” includes reference tohybridization, under stringent hybridization conditions, of a nucleicacid sequence to a specified nucleic acid target sequence to adetectably greater degree (e.g., at least 2-fold over background) thanits hybridization to non-target nucleic acid sequences and to thesubstantial exclusion of non-target nucleic acids. Selectivelyhybridizing sequences typically have about at least 80% sequenceidentity, often 90% sequence identity, or 100% sequence identity (i.e.,complementary) with each other.

[0035] The term “stringent conditions” or “stringent hybridizationconditions” includes reference to conditions under which a probe willselectively hybridize to its target sequence, to a detectably greaterdegree than to other sequences (e.g., at least 2-fold over background).Stringent conditions are sequence-dependent and will be different indifferent circumstances. By controlling the stringency of thehybridization and/or washing conditions, target sequences can beidentified which are 100% complementary to the probe (homologousprobing). Alternatively, stringency conditions can be adjusted to allowsome mismatching in sequences so that lower degrees of similarity aredetected (heterologous probing). Generally, a probe is less than about1000 nucleotides in length, optionally less than 500 nucleotides inlength.

[0036] Typically, stringent conditions will be those in which the saltconcentration is less than about 1.5 M Na ion, typically about 0.01 to1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and thetemperature is at least about 30° C. for short probes (e.g., 10 to 50nucleotides) and at least about 60° C. for long probes (e.g., greaterthan 50 nucleotides). Stringent conditions may also be achieved with theaddition of destabilizing agents such as formamide. Exemplary lowstringency conditions include hybridization with a buffer solution of 30to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37° C.,and a wash in 1× to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at50 to 55° C. Exemplary moderate stringency conditions includehybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37° C., and awash in 0.5× to 1×SSC at 55 to 60° C. Exemplary high stringencyconditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at37° C., and a wash in 0.1×SSC at 60 to 65° C.

[0037] Specificity is typically the function of post-hybridizationwashes, the critical factors being the ionic strength and temperature ofthe final wash solution. For DNA-DNA hybrids, the T_(m) can beapproximated from the equation of Meinkoth and Wahl, Anal. Biochem.,138:267-284 (1984): T_(m)=81.5° C.+16.6 (log M)+0.41 (% GC)−0.61 (%form)−500/L; where M is the molarity of monovalent cations, % GC is thepercentage of guanosine and cytosine nucleotides in the DNA, % form isthe percentage of formamide in the hybridization solution, and L is thelength of the hybrid in base pairs. The T_(m) is the temperature (underdefined ionic strength and pH) at which 50% of a complementary targetsequence hybridizes to a perfectly matched probe. T_(m) is reduced byabout 1° C. for each 1% of mismatching; thus, T_(m), hybridizationand/or wash conditions can be adjusted to hybridize to sequences of thedesired identity. For example, if sequences with ≧90% identity aresought, the T_(m) can be decreased 10° C. Generally, stringentconditions are selected to be about 5° C. lower than the thermal meltingpoint (T_(m)) for the specific sequence and its complement at a definedionic strength and pH. However, severely stringent conditions canutilize a hybridization and/or wash at 1, 2, 3, or 4° C. lower than thethermal melting point (T_(m)); moderately stringent conditions canutilize a hybridization and/or wash at 6, 7, 8, 9, or 10° C. lower thanthe thermal melting point (T_(m)); low stringency conditions can utilizea hybridization and/or wash at 11, 12, 13, 14, 15, or 20° C. lower thanthe thermal melting point (T_(m)). Using the equation, hybridization andwash compositions, and desired T_(m), those of ordinary skill willunderstand that variations in the stringency of hybridization and/orwash solutions are inherently described. If the desired degree ofmismatching results in a T_(m) of less than 45° C. (aqueous solution) or32° C. (formamide solution) it is preferred to increase the SSCconcentration so that a higher temperature can be used. An extensiveguide to the hybridization of nucleic acids is found in Tijssen,Laboratory Techniques in Biochemistry and MolecularBiology—Hybridization with Nucleic Acid Probes, Part I, Chapter 2“Overview of principles of hybridization and the strategy of nucleicacid probe assays”, Elsevier, N.Y. (1993); and Current Protocols inMolecular Biology, Chapter 2, Ausubel et al., Eds., Greene Publishingand Wiley-Interscience, New York (1995).

[0038] As used herein, “transgenic plant” includes reference to a plantwhich comprises within its genome a heterologous polynucleotide.Generally, the heterologous polynucleotide is stably integrated withinthe genome such that the polynucleotide is passed on to successivegenerations. The heterologous polynucleotide may be integrated into thegenome alone or as part of a recombinant expression cassette.“Transgenic” is used herein to include any cell, cell line, callus,tissue, plant part or plant, the genotype of which has been altered bythe presence of heterologous nucleic acid including those transgenicsinitially so altered as well as those created by sexual crosses orasexual propagation from the initial transgenic. The term “transgenic”as used herein does not encompass the alteration of the genome(chromosomal or extra-chromosomal) by conventional plant breedingmethods or by naturally occurring events such as randomcross-fertilization, non-recombinant viral infection, non-recombinantbacterial transformation, non-recombinant transposition, or spontaneousmutation. As used herein a “responsive cell” refers to a cell thatexhibits a positive response to the introduction of mutM polypeptide ormutM polynucleotide compared to a cell that has not been introduced withmutM polypeptide or mutM polynucleotide. The response can be to enhanceefficiency of targeted gene modifications, increase the frequency ofhomologous recombination, increase transformation efficiency or increaserecovery of regenerated plants.

[0039] As used herein, “vector” includes reference to a nucleic acidused in the introduction of a polynucleotide of the present inventioninto a host cell. Vectors are often replicons. Expression vectors permittranscription of a nucleic acid inserted therein.

[0040] The following terms are used to describe the sequencerelationships between a polynucleotide/polypeptide of the presentinvention with a reference polynucleotide/polypeptide: (a) “referencesequence”, (b) “comparison window”, (c) “sequence identity”, and (d)“percentage of sequence identity”.

[0041] (a) As used herein, “reference sequence” is a defined sequenceused as a basis for sequence comparison with apolynucleotide/polypeptide of the present invention. A referencesequence may be a subset or the entirety of a specified sequence; forexample, as a segment of a full-length cDNA or gene sequence, or thecomplete cDNA or gene sequence.

[0042] (b) As used herein, “comparison window” includes reference to acontiguous and specified segment of a polynucleotide/polypeptidesequence, wherein the polynucleotide/polypeptide sequence may becompared to a reference sequence and wherein the portion of thepolynucleotide/polypeptide sequence in the comparison window maycomprise additions or deletions (i.e., gaps) compared to the referencesequence (which does not comprise additions or deletions) for optimalalignment of the two sequences. Generally, the comparison window is atleast 20 contiguous nucleotides/amino acids residues in length, andoptionally can be 30, 40, 50, 100, or longer. Those of skill in the artunderstand that to avoid a high similarity to a reference sequence dueto inclusion of gaps in the polynucleotide/polypeptide sequence, a gappenalty is typically introduced and is subtracted from the number ofmatches.

[0043] Methods of alignment of sequences for comparison are well-knownin the art. Optimal alignment of sequences for comparison may beconducted by the local homology algorithm of Smith and Waterman, Adv.Appl. Math. 2:482 (1981); by the homology alignment algorithm ofNeedleman and Wunsch, J. Mol. Biol. 48:443 (1970); by the search forsimilarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. 85:2444(1988); by computerized implementations of these algorithms, including,but not limited to: CLUSTAL in the PC/Gene program by Intelligenetics,Mountain View, Calif.; GAP, BESTFIT, BLAST, FASTA, and TFASTA in theWisconsin Genetics Software Package, Genetics Computer Group (GCG®,Accelrys Inc., San Diego, Calif., USA); the CLUSTAL program is welldescribed by Higgins and Sharp, Gene 73:237-244 (1988); Higgins andSharp, CABIOS 5:151-153 (1989); Corpet et al., Nucleic Acids Research16:10881-90 (1988); Huang et al., Computer Applications in theBiosciences 8:155-65 (1992), and Pearson et al., Methods in MolecularBiology 24:307-331 (1994).

[0044] The BLAST family of programs which can be used for databasesimilarity searches includes: BLASTN for nucleotide query sequencesagainst nucleotide database sequences; BLASTX for nucleotide querysequences against protein database sequences; BLASTP for protein querysequences against protein database sequences; TBLASTN for protein querysequences against nucleotide database sequences; and TBLASTX fornucleotide query sequences against nucleotide database sequences. See,Current Protocols in Molecular Biology, Chapter 19, Ausubel et al.,Eds., Greene Publishing and Wiley-Interscience, New York (1995).

[0045] Software for performing BLAST analyses is publicly available,e.g., through the National Center for Biotechnology Information. Thisalgorithm involves first identifying high scoring sequence pairs (HSPs)by identifying short words of length W in the query sequence, whicheither match or satisfy some positive-valued threshold score T whenaligned with a word of the same length in a database sequence. T isreferred to as the neighborhood word score threshold. These initialneighborhood word hits act as seeds for initiating searches to findlonger HSPs containing them. The word hits are then extended in bothdirections along each sequence for as far as the cumulative alignmentscore can be increased. Cumulative scores are calculated using, fornucleotide sequences, the parameters M (reward score for a pair ofmatching residues; always >0) and N (penalty score for mismatchingresidues; always <0). For amino acid sequences, a scoring matrix is usedto calculate the cumulative score. Extension of the word hits in eachdirection are halted when: the cumulative alignment score falls off bythe quantity X from its maximum achieved value; the cumulative scoregoes to zero or below, due to the accumulation of one or morenegative-scoring residue alignments; or the end of either sequence isreached. The BLAST algorithm parameters W, T, and X determine thesensitivity and speed of the alignment. The BLASTN program (fornucleotide sequences) uses as defaults a wordlength (W) of 11, anexpectation (E) of 10, a cutoff of 100, M=5, N=−4, and a comparison ofboth strands. For amino acid sequences, the BLASTP program uses asdefaults a wordlength (W) of 3, an expectation (E) of 10, and theBLOSUM62 scoring matrix (see Henikoff & Henikoff (1989) Proc. Natl.Acad. Sci. USA 89:10915).

[0046] In addition to calculating percent sequence identity, the BLASTalgorithm also performs a statistical analysis of the similarity betweentwo sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA90:5873-5877 (1993)). One measure of similarity provided by the BLASTalgorithm is the smallest sum probability (P(N)), which provides anindication of the probability by which a match between two nucleotide oramino acid sequences would occur by chance.

[0047] BLAST searches assume that proteins can be modeled as randomsequences. However, many real proteins comprise regions of nonrandomsequences which may be homopolymeric tracts, short-period repeats, orregions enriched in one or more amino acids. Such low-complexity regionsmay be aligned between unrelated proteins even though other regions ofthe protein are entirely dissimilar. A number of low-complexity filterprograms can be employed to reduce such low-complexity alignments. Forexample, the SEG (Wooten and Federhen, Comput. Chem., 17:149-163 (1993))and XNU (Claverie and States, Comput Chem., 17:191-201 (1993))low-complexity filters can be employed alone or in combination.

[0048] GAP can also be used to compare a polynucleotide or polypeptideof the present invention with a reference sequence. GAP uses thealgorithm of Needleman and Wunsch (J. Mol. Biol. 48:443-453, 1970) tofind the alignment of two complete sequences that maximizes the numberof matches and minimizes the number of gaps. GAP considers all possiblealignments and gap positions and creates the alignment with the largestnumber of matched bases and the fewest gaps. It allows for the provisionof a gap creation penalty and a gap extension penalty in units ofmatched bases. GAP must make a profit of gap creation penalty number ofmatches for each gap it inserts. If a gap extension penalty greater thanzero is chosen, GAP must, in addition, make a profit for each gapinserted of the length of the gap times the gap extension penalty.Default gap creation penalty values and gap extension penalty values inVersion 10 of the Wisconsin Genetics Software Package (GCG®, AccelrysInc., San Diego, Calif., USA) for protein sequences are 8 and 2,respectively. For nucleotide sequences the default gap creation penaltyis 50 while the default gap extension penalty is 3. The gap creation andgap extension penalties can be expressed as an integer selected from thegroup of integers consisting of from 0 to 200. Thus, for example, thegap creation and gap extension penalties can each independently be: 0,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 65 or greater.

[0049] GAP presents one member of the family of best alignments. Theremay be many members of this family, but no other member has a betterquality. GAP displays four figures of merit for alignments: Quality,Ratio, Identity, and Similarity. The Quality is the metric maximized inorder to align the sequences. Ratio is the quality divided by the numberof bases in the shorter segment. Percent Identity is the percent of thesymbols that actually match. Percent Similarity is the percent of thesymbols that are similar. Symbols that are across from gaps are ignored.A similarity is scored when the scoring matrix value for a pair ofsymbols is greater than or equal to 0.50, the similarity threshold. Thedefault scoring matrices used in Version 10 of the Wisconsin GeneticsSoftware Package (GCG®, Accelrys Inc., San Diego, Calif., USA) isBLOSUM62 for polypeptide comparisons (see Henikoff & Henikoff (1989)Proc. Natl. Acad. Sci. USA 89:10915) and NWSGAPDNA for polynucleotidecomparisons.

[0050] Unless otherwise stated, sequence identity/similarity valuesprovided herein refer to the value obtained using the BLAST 2.0 suite ofprograms using default parameters (Altschul et al., Nucleic Acids Res.25:3389-3402, 1997; Altschul et al., J. Mol. Bio. 215:403-410, 1990) orto the value obtained using the GAP program using default parameters(see the Wisconsin Genetics Software Package, Genetics Computer Group(GCG®, Accelrys Inc., San Diego, Calif., USA)).

[0051] (c) As used herein, “sequence identity” or “identity” in thecontext of two nucleic acid or polypeptide sequences includes referenceto the residues in the two sequences which are the same when aligned formaximum correspondence over a specified comparison window. Whenpercentage of sequence identity is used in reference to proteins it isrecognized that residue positions which are not identical often differby conservative amino acid substitutions, where amino acid residues aresubstituted for other amino acid residues with similar chemicalproperties (e.g. charge or hydrophobicity) and therefore do not changethe functional properties of the molecule. Where sequences differ inconservative substitutions, the percent sequence identity may beadjusted upwards to correct for the conservative nature of thesubstitution. Sequences which differ by such conservative substitutionsare said to have “sequence similarity” or “similarity”. Means for makingthis adjustment are well-known to those of skill in the art. Typicallythis involves scoring a conservative substitution as a partial ratherthan a full mismatch, thereby increasing the percentage sequenceidentity. Thus, for example, where an identical amino acid is given ascore of 1 and a non-conservative substitution is given a score of zero,a conservative substitution is given a score between zero and 1. Thescoring of conservative substitutions is calculated, e.g., according tothe algorithm of Meyers and Miller, Computer Applic. Biol. Sci. 4:11-17(1988) e.g., as implemented in the program PC/GENE (Intelligenetics,Mountain View, Calif., USA).

[0052] (d) As used herein, “percentage of sequence identity” means thevalue determined by comparing two optimally aligned sequences over acomparison window, wherein the portion of the polynucleotide sequence inthe comparison window may comprise additions or deletions (i.e., gaps)as compared to the reference sequence (which does not comprise additionsor deletions) for optimal alignment of the two sequences. The percentageis calculated by determining the number of positions at which theidentical nucleic acid base or amino acid residue occurs in bothsequences to yield the number of matched positions, dividing the numberof matched positions by the total number of positions in the window ofcomparison and multiplying the result by 100 to yield the percentage ofsequence identity.

DETAILED DESCRIPTION OF THE INVENTION

[0053] Overview

[0054] It is expected that mutM polynucleotides of the present inventioncan be used to improve plant transformation efficiency, improve targetedgene modification frequency and/or increase the frequency of homologousrecombination. This mutM homologue could be useful as a suppresser ofmutations induced due to oxidative damage to DNA. Furthermore, it couldbe used to reduce illegitimate recombination in maize thereby increasingfrequency of homologous recombination and transformation in maize.Furthermore, while not being bound by any particular theory, becausemutM is known to be involved in removal of oxidative damage inducedmismatch repair, overexpression of mutM is expected to improve theoverall mismatch repair activity, chimeraplasty.

[0055] The present invention provides, among other things, compositionsand methods for modulating (i.e., increasing or decreasing) the level ofpolynucleotides and polypeptides of the present invention in plants. Inparticular, the polynucleotides and polypeptides of the presentinvention can be expressed temporally or spatially, e.g., atdevelopmental stages, in tissues, and/or in quantities, which areuncharacteristic of non-recombinantly engineered plants. Thus, thepresent invention provides utility in such exemplary applications assuppression of mutations and a suppression of illegitimaterecombination.

[0056] The present invention also provides isolated nucleic acidscomprising polynucleotides of sufficient length and complementarity to agene of the present invention to use as probes or amplification primersin the detection, quantitation, or isolation of gene transcripts. Forexample, isolated nucleic acids of the present invention can be used asprobes in detecting deficiencies in the level of mRNA in screenings fordesired transgenic plants, for detecting mutations in the gene (e.g.,substitutions, deletions, or additions), for monitoring upregulation ofexpression or changes in enzyme activity in screening assays ofcompounds, for detection of any number of allelic variants(polymorphisms), orthologs, or paralogs of the gene, or for sitedirected mutagenesis in eukaryotic cells (see, e.g., U.S. Pat. No.5,565,350). The isolated nucleic acids of the present invention can alsobe used for recombinant expression of their encoded polypeptides, or foruse as immunogens in the preparation and/or screening of antibodies. Theisolated nucleic acids of the present invention can also be employed foruse in sense or antisense suppression of one or more genes of thepresent invention in a host cell, tissue, or plant. Attachment ofchemical agents which bind, intercalate, cleave and/or crosslink to theisolated nucleic acids of the present invention can also be used tomodulate transcription or translation.

[0057] The present invention also provides isolated proteins comprisinga polypeptide of the present invention (e.g., preproenzyme, proenzyme,or enzymes). The present invention also provides proteins comprising atleast one epitope from a polypeptide of the present invention. Theproteins of the present invention can be employed in assays for enzymeagonists or antagonists of enzyme function, or for use as immunogens orantigens to obtain antibodies specifically immunoreactive with a proteinof the present invention. Such antibodies can be used in assays forexpression levels, for identifying and/or isolating nucleic acids of thepresent invention from expression libraries, for identification ofhomologous polypeptides from other species, or for purification ofpolypeptides of the present invention.

[0058] The isolated nucleic acids and polypeptides of the presentinvention can be used over a broad range of plant types, includingmonocots such as the species of the family Gramineae including Hordeum,Secale, Triticum, Sorghum (e.g., S. bicolor), Oryza, Avena and Zea(e.g., Z. mays). The isolated nucleic acid and proteins of the presentinvention can also be used in species from the genera: Cucurbita, Rosa,Vitis, Juglans, Fragaria, Lotus, Medicago, Onobrychis, Trifolium,Trigonella, Vigna, Citrus, Linum, Geranium, Manihot, Daucus,Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, Datura,Hyoscyamus, Lycopersicon, Nicotiana, Solanum, Petunia, Digitalis,Majorana, Ciahorium, Helianthus, Lactuca, Bromus, Asparagus,Antirrhinum, Heterocallis, Nemesis, Pelargonium, Panieum, Pennisetum,Ranunculus, Senecio, Salpiglossis, Cucumis, Browallia, Glycine, Pisum,Phaseolus, Carthamus, Pennisetum, Gossypium and Lolium.

[0059] Nucleic Acids

[0060] The present invention provides, among other things, isolatednucleic acids of RNA, DNA, and analogs and/or chimeras thereof,comprising a polynucleotide of the present invention.

[0061] A polynucleotide of the present invention is inclusive of:

[0062] (a) a polynucleotide encoding a polypeptide of SEQ ID NO: 2including exemplary polynucleotides of SEQ ID NO: 1; the polynucleotidesequences of the invention also include the maize mutM polynucleotidesequence as contained in a plasmid deposited with American Type CultureCollection (ATCC) and assigned Accession Numbers PTA-832.

[0063] (b) a polynucleotide which is the product of amplification from aZea mays nucleic acid library using primer pairs which selectivelyhybridize under stringent conditions to loci within a polynucleotideselected from the group consisting of SEQ ID NO: 1; or the sequence ascontained in the ATCC deposit assigned Accession Numbers PTA-832.

[0064] (c) a polynucleotide which selectively hybridizes to apolynucleotide of (a) or (b);

[0065] (d) a polynucleotide having a specified sequence identity withpolynucleotides of (a), (b), or (c);

[0066] (e) a polynucleotide encoding a protein having a specified numberof contiguous amino acids from a prototype polypeptide, wherein theprotein is specifically recognized by antisera elicited by presentationof the protein and wherein the protein does not detectably immunoreactto antisera which has been fully immunosorbed with the protein;

[0067] (f) complementary sequences of polynucleotides of (a), (b), (c),(d), or (e); and

[0068] (g) a polynucleotide comprising at least a specific number ofcontiguous nucleotides from a polynucleotide of (a), (b), (c), (d), (e),or (f).

[0069] The polynucleotide of SEQ ID NO: 1 is contained in a plasmiddeposited with American Type Culture Collection (ATCC) on Oct. 8, 1999and assigned Accession Number PTA-832. American Type Culture Collectionis located at 10801 University Blvd., Manassas, Va. 20110-2209.

[0070] The ATCC deposit will be maintained under the terms of theBudapest Treaty on the International Recognition of the Deposit ofMicroorganisms for the Purposes of Patent Procedure. This deposit isprovided as a convenience to those of skill in the art and is not anadmission that a deposit is required under 35 U.S.C. Section 112. Thedeposited sequence, as well as the polypeptides encoded by the sequence,are incorporated herein by reference and control in the event of anyconflict, such as a sequencing error, with the description in thisapplication.

[0071] A. Polynucleotides Encoding a Polypeptide of the PresentInvention

[0072] As indicated in (a), above, the present invention providesisolated nucleic acids comprising a polynucleotide of the presentinvention, wherein the polynucleotide encodes a polypeptide of thepresent invention. Every nucleic acid sequence herein that encodes apolypeptide also, by reference to the genetic code, describes everypossible silent variation of the nucleic acid. One of ordinary skillwill recognize that each codon in a nucleic acid (except AUG, which isordinarily the only codon for methionine; and UGG, which is ordinarilythe only codon for tryptophan) can be modified to yield a functionallyidentical molecule. Thus, each silent variation of a nucleic acid whichencodes a polypeptide of the present invention is implicit in eachdescribed polypeptide sequence and is within the scope of the presentinvention. Accordingly, the present invention includes polynucleotidesof SEQ ID NO: 1, and the sequences as contained in the ATCC depositassigned Accession Number PTA-832, and polynucleotides encoding apolypeptide of SEQ ID NO: 2.

[0073] B. Polynucleotides Amplified from a Zea mays Nucleic Acid Library

[0074] As indicated in (b), above, the present invention provides anisolated nucleic acid comprising a polynucleotide of the presentinvention, wherein the polynucleotides are amplified from a Zea maysnucleic acid library. Zea mays lines B73, PHRE1, A632, BMS-P2#10, W23,and Mo17 are known and publicly available. Other publicly known andavailable maize lines can be obtained from the Maize GeneticsCooperation (Urbana, Ill.). The nucleic acid library may be a cDNAlibrary, a genomic library, or a library generally constructed fromnuclear transcripts at any stage of intron processing. cDNA librariescan be normalized to increase the representation of relatively rarecDNAs. In optional embodiments, the cDNA library is constructed using afull-length cDNA synthesis method. Examples of such methods includeOligo-Capping (Maruyama, K. and Sugano, S. Gene 138:171-174, 1994),Biotinylated CAP Trapper (Carninci, P., Kvan, C. et al. Genomics37:327-336, 1996), and CAP Retention Procedure (Edery, E., Chu, L. L.,et al. Molecular and Cellular Biology 15:3363-3371, 1995). cDNAsynthesis is often catalyzed at 50-55° C. to prevent formation of RNAsecondary structure. Examples of reverse transcriptases that arerelatively stable at these temperatures are SUPERSCRIPT II ReverseTranscriptase (Life Technologies, Inc.), AMV Reverse Transcriptase(Boehringer Mannheim) and RetroAmp Reverse Transcriptase (Epicentre).Rapidly growing tissues, or rapidly dividing cells are preferably usedas mRNA sources.

[0075] The present invention also provides subsequences of thepolynucleotides of the present invention. A variety of subsequences canbe obtained using primers which selectively hybridize under stringentconditions to at least two sites within a polynucleotide of the presentinvention, or to two sites within the nucleic acid which flank andcomprise a polynucleotide of the present invention, or to a site withina polynucleotide of the present invention and a site within the nucleicacid which comprises it. Primers are chosen to selectively hybridize,under stringent hybridization conditions, to a polynucleotide of thepresent invention. Generally, the primers are complementary to asubsequence of the target nucleic acid which they amplify but may have asequence identity ranging from about 85% to 99% relative to thepolynucleotide sequence which they are designed to anneal to. As thoseskilled in the art will appreciate, the sites to which the primer pairswill selectively hybridize are chosen such that a single contiguousnucleic acid can be formed under the desired amplification conditions.

[0076] In optional embodiments, the primers will be constructed so thatthey selectively hybridize under stringent conditions to a sequence (orits complement) within the target nucleic acid which comprises the codonencoding the carboxy or amino terminal amino acid residue (i.e., the 3′terminal coding region and 5′ terminal coding region, respectively) ofthe polynucleotides of the present invention. Optionally within theseembodiments, the primers will be constructed to selectively hybridizeentirely within the coding region of the target polynucleotide of thepresent invention such that the product of amplification of a cDNAtarget will consist of the coding region of that cDNA. The primer lengthin nucleotides is selected from the group of integers consisting of fromat least 15 to 50. Thus, the primers can be at least 15, 18, 20, 25, 30,40, or 50 nucleotides in length. Those of skill will recognize that alengthened primer sequence can be employed to increase specificity ofbinding (i.e., annealing) to a target sequence. A non-annealing sequenceat the 5′ end of a primer (a “tail”) can be added, for example, tointroduce a cloning site at the terminal ends of the amplicon.

[0077] The amplification products can be translated using expressionsystems well known to those of skill in the art and as discussed, infra.The resulting translation products can be confirmed as polypeptides ofthe present invention by, for example, assaying for the appropriatecatalytic activity (e.g., specific activity and/or substratespecificity), or verifying the presence of one or more linear epitopeswhich are specific to a polypeptide of the present invention. Methodsfor protein synthesis from PCR derived templates are known in the artand available commercially. See, e.g., Amersham Life Sciences, Inc,Catalog '97, p.354.

[0078] Methods for obtaining 5′ and/or 3′ ends of a vector insert arewell known in the art. See, e.g., RACE (Rapid Amplification ofComplementary Ends) as described in Frohman, M. A., in PCR Protocols: AGuide to Methods and Applications, M. A. Innis, D. H. Gelfand, J. J.Sninsky, T. J. White, Eds. (Academic Press, Inc., San Diego), pp. 28-38(1990)); see also, U.S. Pat. No. 5,470,722, and Current Protocols inMolecular Biology, Unit 15.6, Ausubel et al., Eds., Greene Publishingand Wiley-Interscience, New York (1995); Frohman and Martin, Techniques1:165 (1989).

[0079] C. Polynucleotides Which Selectively Hybridize to aPolynucleotide of (A) or (B)

[0080] As indicated in (c), above, the present invention providesisolated nucleic acids comprising polynucleotides of the presentinvention, wherein the polynucleotides selectively hybridize, underselective hybridization conditions, to a polynucleotide of sections (A)or (B) as discussed above. Thus, the polynucleotides of this embodimentcan be used for isolating, detecting, and/or quantifying nucleic acidscomprising the polynucleotides of (A) or (B). For example,polynucleotides of the present invention can be used to identify,isolate, or amplify partial or full-length clones in a depositedlibrary. In some embodiments, the polynucleotides are genomic or cDNAsequences isolated or otherwise complementary to a cDNA from a dicot ormonocot nucleic acid library. Exemplary species of monocots and dicotsinclude, but are not limited to: maize, canola, soybean, cotton, wheat,sorghum, sunflower, alfalfa, oats, sugar cane, millet, barley, and rice.Optionally, the cDNA library comprises at least 30% to 95% full-lengthsequences (for example, at least 30%, 40%, 50%, 60%, 70%, 80%, 90%, or95% full-length sequences). The cDNA libraries can be normalized toincrease the representation of rare sequences. Low stringencyhybridization conditions are typically, but not exclusively, employedwith sequences having a reduced sequence identity relative tocomplementary sequences. Moderate and high stringency conditions canoptionally be employed for sequences of greater identity. Low stringencyconditions allow selective hybridization of sequences having about 70%to 80% sequence identity and can be employed to identify orthologous orparalogous sequences.

[0081] D. Polynucleotides Having a Specific Sequence Identity with thePolynucleotides of (A), (B) or (C)

[0082] As indicated in (d), above, the present invention providesisolated nucleic acids comprising polynucleotides of the presentinvention, wherein the polynucleotides have a specified identity at thenucleotide level to a polynucleotide as disclosed above in sections (A),(B), or (C), above. Identity can be calculated using, for example, theBLAST or GAP algorithms under default conditions. The percentage ofidentity to a reference sequence is at least 60% and, rounded upwards tothe nearest integer, can be expressed as an integer selected from thegroup of integers consisting of from 60 to 99. Thus, for example, thepercentage of identity to a reference sequence can be at least 70%, 75%,80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%,94%, 95%, 96%, 97%, 98%, or 99%.

[0083] Optionally, the polynucleotides of this embodiment will encode apolypeptide that will share an epitope with a polypeptide encoded by thepolynucleotides of sections (A), (B), or (C). Thus, thesepolynucleotides encode a first polypeptide which elicits production ofantisera comprising antibodies which are specifically reactive to asecond polypeptide encoded by a polynucleotide of (A), (B), or (C).However, the first polypeptide does not bind to antisera raised againstitself when the antisera has been fully immunosorbed with the firstpolypeptide. Hence, the polynucleotides of this embodiment can be usedto generate antibodies for use in, for example, the screening ofexpression libraries for nucleic acids comprising polynucleotides of(A), (B), or (C), or for purification of, or in immunoassays for,polypeptides encoded by the polynucleotides of (A), (B), or (C). Thepolynucleotides of this embodiment embrace nucleic acid sequences whichcan be employed for selective hybridization to a polynucleotide encodinga polypeptide of the present invention.

[0084] Screening polypeptides for specific binding to antisera can beconveniently achieved using peptide display libraries. This methodinvolves the screening of large collections of peptides for individualmembers having the desired function or structure. Antibody screening ofpeptide display libraries is well known in the art. The displayedpeptide sequences can be from 3 to 5000 or more amino acids in length,frequently from 5-100 amino acids long, and often from about 8 to 15amino acids long. In addition to direct chemical synthetic methods forgenerating peptide libraries, several recombinant DNA methods have beendescribed. One type involves the display of a peptide sequence on thesurface of a bacteriophage or cell. Each bacteriophage or cell containsthe nucleotide sequence encoding the particular displayed peptidesequence. Such methods are described in PCT patent publication Nos.91/17271, 91/18980, 91/19818, and 93/08278. Other systems for generatinglibraries of peptides have aspects of both in vitro chemical synthesisand recombinant methods. See, PCT Patent publication Nos. 92/05258,92/14843, and 97/20078. See also, U.S. Pat. Nos. 5,658,754; and5,643,768. Peptide display libraries, vectors, and screening kits arecommercially available from such suppliers as Invitrogen (Carlsbad,Calif.).

[0085] E. Polynucleotides Encoding a Protein Having a Subsequence from aPrototype Polypeptide and is Cross-Reactive to the Prototype Polypeptide

[0086] As indicated in (e), above, the present invention providesisolated nucleic acids comprising polynucleotides of the presentinvention, wherein the polynucleotides encode a protein having asubsequence of contiguous amino acids from a prototype polypeptide ofthe present invention such as are provided in (a), above. The length ofcontiguous amino acids from the prototype polypeptide is selected fromthe group of integers consisting of from at least 10 to the number ofamino acids within the prototype sequence. Thus, for example, thepolynucleotide can encode a polypeptide having a subsequence having atleast 10, 15, 20, 25, 30, 35, 40, 45, or 50, contiguous amino acids fromthe prototype polypeptide. Further, the number of such subsequencesencoded by a polynucleotide of the instant embodiment can be any integerselected from the group consisting of from 1 to 20, such as 2, 3, 4, or5. The subsequences can be separated by any integer of nucleotides from1 to the number of nucleotides in the sequence such as at least 5, 10,15, 25, 50, 100, or 200 nucleotides.

[0087] The proteins encoded by polynucleotides of this embodiment, whenpresented as an immunogen, elicit the production of polyclonalantibodies which specifically bind to a prototype polypeptide such asbut not limited to, a polypeptide encoded by the polynucleotide of (a)or (b), above. Generally, however, a protein encoded by a polynucleotideof this embodiment does not bind to antisera raised against theprototype polypeptide when the antisera has been fully immunosorbed withthe prototype polypeptide. Methods of making and assaying for antibodybinding specificity/affinity are well known in the art. Exemplaryimmunoassay formats include ELISA, competitive immunoassays,radioimmunoassays, Western blots, indirect immunofluorescent assays andthe like.

[0088] In one assay method, fully immunosorbed and pooled antisera whichis elicited to the prototype polypeptide can be used in a competitivebinding assay to test the protein. The concentration of the prototypepolypeptide required to inhibit 50% of the binding of the antisera tothe prototype polypeptide is determined. If the amount of the proteinrequired to inhibit binding is less than twice the amount of theprototype protein, then the protein is said to specifically bind to theantisera elicited to the immunogen. Accordingly, the proteins of thepresent invention embrace allelic variants, conservatively modifiedvariants, and minor recombinant modifications to a prototypepolypeptide.

[0089] A polynucleotide of the present invention optionally encodes aprotein having a molecular weight as the non-glycosylated protein within20% of the molecular weight of the full-length non-glycosylatedpolypeptides of the present invention. Molecular weight can be readilydetermined by SDS-PAGE under reducing conditions. Optionally, themolecular weight is within 15% of a full length polypeptide of thepresent invention, or within 10% or 5%, or optionally within 3%, 2%, or1% of a full length polypeptide of the present invention.

[0090] Optionally, the polynucleotides of this embodiment will encode aprotein having a specific enzymatic activity at least 50%, 60%, 80%, or90% of a cellular extract comprising the native, endogenous full-lengthpolypeptide of the present invention. Further, the proteins encoded bypolynucleotides of this embodiment will optionally have a substantiallysimilar affinity constant (K_(m)) and/or catalytic activity (i.e., themicroscopic rate constant, k_(cat)) as the native endogenous,full-length protein. Those of skill in the art will recognize thatk_(cat)/K_(m) value determines the specificity for competing substratesand is often referred to as the specificity constant. Proteins of thisembodiment can have a k_(cat)/K_(m) value at least 10% of a full-lengthpolypeptide of the present invention as determined using the endogenoussubstrate of that polypeptide. Optionally, the k_(cat)/K_(m) value willbe at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 95% thek_(cat)/K_(m) value of the full-length polypeptide of the presentinvention. Determination of k_(cat), K_(m) , and k_(cat)/K_(m) can bedetermined by any number of means well known to those of skill in theart. For example, the initial rates (i.e., the first 5% or less of thereaction) can be determined using rapid mixing and sampling techniques(e.g., continuous-flow, stopped-flow, or rapid quenching techniques),flash photolysis, or relaxation methods (e.g., temperature jumps) inconjunction with such exemplary methods of measuring asspectrophotometry, spectrofluorimetry, nuclear magnetic resonance, orradioactive procedures. Kinetic values are conveniently obtained using aLineweaver-Burk or Eadie-Hofstee plot.

[0091] F. Polynucleotides Complementary to the Polynucleotides of(A)-(E)

[0092] As indicated in (f), above, the present invention providesisolated nucleic acids comprising polynucleotides complementary to thepolynucleotides of paragraphs A-E, above. As those of skill in the artwill recognize, complementary sequences base-pair throughout theentirety of their length with the polynucleotides of sections (A)-(E)(i.e., have 100% sequence identity over their entire length).Complementary bases associate through hydrogen bonding in doublestranded nucleic acids. For example, the following base pairs arecomplementary: guanine and cytosine; adenine and thymine; and adenineand uracil.

[0093] G. Polynucleotides Which are Subsequences of the Polynucleotidesof (A)-(F)

[0094] As indicated in (g), above, the present invention providesisolated nucleic acids comprising polynucleotides which comprise atleast 15 contiguous bases from the polynucleotides of sections (A)through (F) as discussed above. The length of the polynucleotide isgiven as an integer selected from the group consisting of from at least15 to the length of the nucleic acid sequence from which thepolynucleotide is a subsequence of. Thus, for example, polynucleotidesof the present invention are inclusive of polynucleotides comprising atleast 15, 20, 25, 30, 40, 50, 60, 75, or 100 contiguous nucleotides inlength from the polynucleotides of (A)-(F). Optionally, the number ofsuch subsequences encoded by a polynucleotide of the instant embodimentcan be any integer selected from the group consisting of from 1 to 20,such as 2, 3, 4, or 5. The subsequences can be separated by any integerof nucleotides from 1 to the number of nucleotides in the sequence suchas at least 5, 10, 15, 25, 50, 100, or 200 nucleotides.

[0095] The subsequences of the present invention can comprise structuralcharacteristics of the sequence from which it is derived. Alternatively,the subsequences can lack certain structural characteristics of thelarger sequence from which it is derived such as a poly (A) tail.Optionally, a subsequence from a polynucleotide encoding a polypeptidehaving at least one linear epitope in common with a prototypepolypeptide sequence as provided in (a), above, may encode an epitope incommon with the prototype sequence. Alternatively, the subsequence maynot encode an epitope in common with the prototype sequence but can beused to isolate the larger sequence by, for example, nucleic acidhybridization with the sequence from which it's derived. Subsequencescan be used to modulate or detect gene expression by introducing intothe subsequences compounds which bind, intercalate, cleave and/orcrosslink to nucleic acids. Exemplary compounds include acridine,psoralen, phenanthroline, naphthoquinone, daunomycin orchloroethylaminoaryl conjugates.

[0096] Construction of Nucleic Acids

[0097] The isolated nucleic acids of the present invention can be madeusing (a) standard recombinant methods, (b) synthetic techniques, orcombinations thereof. In some embodiments, the polynucleotides of thepresent invention will be cloned, amplified, or otherwise constructedfrom a monocot. In optional embodiments the monocot is Zea mays.

[0098] The nucleic acids may conveniently comprise sequences in additionto a polynucleotide of the present invention. For example, amulti-cloning site comprising one or more endonuclease restriction sitesmay be inserted into the nucleic acid to aid in isolation of thepolynucleotide. Also, translatable sequences may be inserted to aid inthe isolation of the translated polynucleotide of the present invention.For example, a hexa-histidine marker sequence provides a convenientmeans to purify the proteins of the present invention. A polynucleotideof the present invention can be attached to a vector, adapter, or linkerfor cloning and/or expression of a polynucleotide of the presentinvention. Additional sequences may be added to such cloning and/orexpression sequences to optimize their function in cloning and/orexpression, to aid in isolation of the polynucleotide, or to improve theintroduction of the polynucleotide into a cell. Typically, the length ofa nucleic acid of the present invention less the length of itspolynucleotide of the present invention is less than 20 kilobase pairs,often less than 15 kb, and frequently less than 10 kb. Use of cloningvectors, expression vectors, adapters, and linkers is well known andextensively described in the art. For a description of various nucleicacids see, for example, Stratagene Cloning Systems, Catalogs 1995, 1996,1997 (La Jolla, Calif.); and, Amersham Life Sciences, Inc, Catalog '97(Arlington Heights, Ill.).

[0099] A. Recombinant Methods for Constructing Nucleic Acids

[0100] The isolated nucleic acid compositions of this invention, such asRNA, cDNA, genomic DNA, or a hybrid thereof, can be obtained from plantbiological sources using any number of cloning methodologies known tothose of skill in the art. In some embodiments, oligonucleotide probeswhich selectively hybridize, under stringent conditions, to thepolynucleotides of the present invention are used to identify thedesired sequence in a cDNA or genomic DNA library. Isolation of RNA, andconstruction of cDNA and genomic libraries is well known to those ofordinary skill in the art. See, e.g., Plant Molecular Biology: ALaboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997); and,Current Protocols in Molecular Biology, Ausubel et al., Eds., GreenePublishing and Wiley-Interscience, New York (1995).

[0101] A number of cDNA synthesis protocols have been described whichprovide substantially pure full-length cDNA libraries. Substantiallypure full-length cDNA libraries are constructed to comprise at least90%, or at least 93% to 95% full-length inserts amongst clonescontaining inserts. The length of insert in such libraries can be from 0to 8, 9, 10, 11, 12, 13, or more kilobase pairs. Vectors to accommodateinserts of these sizes are known in the art and available commercially.See, e.g., Stratagene's lambda ZAP Express (cDNA cloning vector with 0to 12 kb cloning capacity). An exemplary method of constructing agreater than 95% pure full-length cDNA library is described by Carninciet al., Genomics 37:327-336 (1996). Other methods for producingfull-length libraries are known in the art. See, e.g., Edery et al.,Mol. Cell Biol. 15(6):3363-3371 (1995); and, PCT Application WO96/34981.

[0102] A1. Normalized or Subtracted cDNA Libraries

[0103] A non-normalized cDNA library represents the mRNA population ofthe tissue it was made from. Since unique clones are out-numbered byclones derived from highly expressed genes their isolation can belaborious. Normalization of a cDNA library is the process of creating alibrary in which each clone is more equally represented. Construction ofnormalized libraries is described in Ko, Nucl. Acids. Res.,18(19):5705-5711 (1990); Patanjali et al., Proc. Natl. Acad. USA88:1943-1947 (1991); U.S. Pat. Nos. 5,482,685, and 5,637,685. In anexemplary method described by Soares et al., normalization resulted inreduction of the abundance of clones from a range of four orders ofmagnitude to a narrow range of only 1 order of magnitude. Proc. Natl.Acad. Sci. USA 91:9228-9232 (1994).

[0104] Subtracted cDNA libraries are another means to increase theproportion of less abundant cDNA species. In this procedure, cDNAprepared from one pool of mRNA is depleted of sequences present in asecond pool of mRNA by hybridization. The cDNA:mRNA hybrids are removedand the remaining un-hybridized cDNA pool is enriched for sequencesunique to that pool. See, Foote et al. in, Plant Molecular Biology: ALaboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997); Kho andZarbl, Technique 3(2):58-63 (1991); Sive and St. John, Nucl. Acids Res.16(22):10937 (1988); Current Protocols in Molecular Biology, Ausubel etal., Eds., Greene Publishing and Wiley-Interscience, New York (1995);and, Swaroop et al., Nucl. Acids Res. 19(8):1954 (1991). cDNAsubtraction kits are commercially available. See, e.g., PCR-Select(Clontech, Palo Alto, Calif.).

[0105] To construct genomic libraries, large segments of genomic DNA aregenerated by fragmentation, e.g. using restriction endonucleases, andare ligated with vector DNA to form concatemers that can be packagedinto the appropriate vector. Methodologies to accomplish these ends, andsequencing methods to verify the sequence of nucleic acids are wellknown in the art. Examples of appropriate molecular biologicaltechniques and instructions sufficient to direct persons of skillthrough many construction, cloning, and screening methodologies arefound in Sambrook et al., Molecular Cloning: A Laboratory Manual, 2ndEd., Cold Spring Harbor Laboratory Vols. 1-3 (1989), Methods inEnzymology, Vol. 152: Guide to Molecular Cloning Techniques, Berger andKimmel, Eds., San Diego: Academic Press, Inc. (1987), Current Protocolsin Molecular Biology, Ausubel et al., Eds., Greene Publishing andWiley-Interscience, New York (1995); Plant Molecular Biology: ALaboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997). Kits forconstruction of genomic libraries are also commercially available.

[0106] The cDNA or genomic library can be screened using a probe basedupon the sequence of a polynucleotide of the present invention such asthose disclosed herein. Probes may be used to hybridize with genomic DNAor cDNA sequences to isolate homologous genes in the same or differentplant species. Those of skill in the art will appreciate that variousdegrees of stringency of hybridization can be employed in the assay; andeither the hybridization or the wash medium can be stringent.

[0107] The nucleic acids of interest can also be amplified from nucleicacid samples using amplification techniques. For instance, polymerasechain reaction (PCR) technology can be used to amplify the sequences ofpolynucleotides of the present invention and related genes directly fromgenomic DNA or cDNA libraries. PCR and other in vitro amplificationmethods may also be useful, for example, to clone nucleic acid sequencesthat code for proteins to be expressed, to make nucleic acids to use asprobes for detecting the presence of the desired mRNA in samples, fornucleic acid sequencing, or for other purposes. The T4 gene 32 protein(Boehringer Mannheim) can be used to improve yield of long PCR products.

[0108] PCR-based screening methods have been described. Wilfinger et al.describe a PCR-based method in which the longest cDNA is identified inthe first step so that incomplete clones can be eliminated from study.BioTechniques 22(3):481-486 (1997). Such methods are particularlyeffective in combination with a full-length cDNA constructionmethodology, above.

[0109] B. Synthetic Methods for Constructing Nucleic Acids

[0110] The isolated nucleic acids of the present invention can also beprepared by direct chemical synthesis by methods such as thephosphotriester method of Narang et al., Meth. Enzymol. 68:90-99 (1979);the phosphodiester method of Brown et al., Meth. Enzymol. 68:109-151(1979); the diethylphosphoramidite method of Beaucage et al., Tetra.Lett. 22:1859-1862 (1981); the solid phase phosphoramidite triestermethod described by Beaucage and Caruthers, Tetra. Letts.22(20):1859-1862 (1981), e.g., using an automated synthesizer, e.g., asdescribed in Needham-VanDevanter et al., Nucleic Acids Res. 12:6159-6168(1984); and, the solid support method of U.S. Pat. No. 4,458,066.Chemical synthesis generally produces a single stranded oligonucleotide.This may be converted into double stranded DNA by hybridization with acomplementary sequence, or by polymerization with a DNA polymerase usingthe single strand as a template. One of skill will recognize that whilechemical synthesis of DNA is best employed for sequences of about 100bases or less, longer sequences may be obtained by the ligation ofshorter sequences.

[0111] Recombinant Expression Cassettes

[0112] The present invention further provides recombinant expressioncassettes comprising a nucleic acid of the present invention. A nucleicacid sequence coding for the desired polypeptide of the presentinvention, for example a cDNA or a genomic sequence encoding a fulllength polypeptide of the present invention, can be used to construct arecombinant expression cassette which can be introduced into the desiredhost cell. A recombinant expression cassette will typically comprise apolynucleotide of the present invention operably linked totranscriptional initiation regulatory sequences which will direct thetranscription of the polynucleotide in the intended host cell, such astissues of a transformed plant.

[0113] For example, plant expression vectors may include (1) a clonedplant gene under the transcriptional control of 5′ and 3′ regulatorysequences and (2) a selectable marker. Such plant expression vectors mayalso contain, if desired, a promoter regulatory region (e.g., oneconferring inducible or constitutive, environmentally- ordevelopmentally-regulated, or cell- or tissue-specific/selectiveexpression), a transcription initiation start site, a ribosome bindingsite, an RNA processing signal, a transcription termination site, and/ora polyadenylation signal.

[0114] A plant promoter fragment can be employed which will directexpression of a polynucleotide of the present invention in all tissuesof a regenerated plant. Such promoters are referred to herein as“constitutive” promoters and are active under most environmentalconditions and states of development or cell differentiation. Examplesof constitutive promoters include the cauliflower mosaic virus (CaMV)35S transcription initiation region, the 1′- or 2′- promoter derivedfrom T-DNA of Agrobacterium tumefaciens, the ubiquitin 1 promoter, theSmas promoter, the cinnamyl alcohol dehydrogenase promoter (U.S. Pat.No. 5,683,439), the Nos promoter, the pEmu promoter, the rubiscopromoter, the GRP1-8 promoter, and other transcription initiationregions from various plant genes known to those of skill. One exemplarypromoter is the ubiquitin promoter, which can be used to driveexpression of the present invention in maize embryos or embryogeniccallus.

[0115] Alternatively, the plant promoter can direct expression of apolynucleotide of the present invention in a specific tissue or may beotherwise under more precise environmental or developmental control.Such promoters are referred to here as “inducible” promoters.Environmental conditions that may effect transcription by induciblepromoters include pathogen attack, anaerobic conditions, or the presenceof light. Examples of inducible promoters are the Adh1 promoter which isinducible by hypoxia or cold stress, the Hsp70 promoter which isinducible by heat stress, and the PPDK promoter which is inducible bylight.

[0116] Examples of promoters under developmental control includepromoters that initiate transcription only, or preferentially, incertain tissues, such as leaves, roots, fruit, seeds, or flowers.Exemplary promoters include the anther specific promoter 5126 (U.S. Pat.Nos. 5,689,049 and 5,689,051), glob-1 promoter, and gamma-zein promoter.The operation of a promoter may also vary depending on its location inthe genome. Thus, an inducible promoter may become fully or partiallyconstitutive in certain locations.

[0117] Both heterologous and non-heterologous (i.e., endogenous)promoters can be employed to direct expression of the nucleic acids ofthe present invention. These promoters can also be used, for example, inrecombinant expression cassettes to drive expression of antisensenucleic acids to reduce, increase, or alter concentration and/orcomposition of the proteins of the present invention in a desiredtissue. Thus, in some embodiments, the nucleic acid construct willcomprise a promoter functional in a plant cell, such as in Zea mays,operably linked to a polynucleotide of the present invention. Promotersuseful in these embodiments include the endogenous promoters drivingexpression of a polypeptide of the present invention.

[0118] In some embodiments, isolated nucleic acids which serve aspromoter or enhancer elements can be introduced in the appropriateposition (generally upstream) of a non-heterologous form of apolynucleotide of the present invention so as to up or down regulateexpression of a polynucleotide of the present invention. For example,endogenous promoters can be altered in vivo by mutation, deletion,and/or substitution (see, Kmiec, U.S. Pat. No. 5,565,350; Zarling etal., PCT/US93/03868), or isolated promoters can be introduced into aplant cell in the proper orientation and distance from a gene of thepresent invention so as to control the expression of the gene. Geneexpression can be modulated under conditions suitable for plant growthso as to alter the total concentration and/or alter the composition ofthe polypeptides of the present invention in plant cell. Thus, thepresent invention provides compositions, and methods for making,heterologous promoters and/or enhancers operably linked to a native,endogenous (i.e., non-heterologous) form of a polynucleotide of thepresent invention.

[0119] If polypeptide expression is desired, it is generally desirableto include a polyadenylation region at the 3′-end of a polynucleotidecoding region. The polyadenylation region can be derived from thenatural gene, from a variety of other plant genes, or from T-DNA. The 3′end sequence to be added can be derived from, for example, the nopalinesynthase or octopine synthase genes, or alternatively from another plantgene, or from any other eukaryotic gene.

[0120] An intron sequence can be added to the 5′ untranslated region orthe coding sequence of the partial coding sequence to increase theamount of the mature message that accumulates in the cytosol. Inclusionof a spliceable intron in the transcription unit in both plant andanimal expression constructs has been shown to increase gene expressionat both the mRNA and protein levels up to 1000-fold. Buchman and Berg,Mol. Cell Biol. 8:4395-4405 (1988); Callis et al., Genes Dev.1:1183-1200 (1987). Such intron enhancement of gene expression istypically greatest when placed near the 5′ end of the transcriptionunit. Use of maize introns Adh1-S intron 1, 2, and 6, the Bronze-1intron are known in the art. See generally, The Maize Handbook, Chapter116, Freeling and Walbot, Eds., Springer, N.Y. (1994). The vectorcomprising the sequences from a polynucleotide of the present inventionwill typically comprise a marker gene which confers a selectablephenotype on plant cells. Typical vectors useful for expression of genesin higher plants are well known in the art and include vectors derivedfrom the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciensdescribed by Rogers et al., Meth. in Enzymol., 153:253-277 (1987).

[0121] A polynucleotide of the present invention can be expressed ineither sense or anti-sense orientation as desired. It will beappreciated that control of gene expression in either sense oranti-sense orientation can have a direct impact on the observable plantcharacteristics. Antisense technology can be conveniently used toinhibit gene expression in plants. To accomplish this, a nucleic acidsegment from the desired gene is cloned and operably linked to apromoter such that the anti-sense strand of RNA will be transcribed. Theconstruct is then transformed into plants and the antisense strand ofRNA is produced. In plant cells, it has been shown that antisense RNAinhibits gene expression by preventing the accumulation of mRNA whichencodes the enzyme of interest, see, e.g., Sheehy et al., Proc. Nat'l.Acad. Sci. USA 85:8805-8809 (1988); and Hiatt et al., U.S. Pat. No.4,801,340.

[0122] Another method of suppression is sense suppression. Introductionof nucleic acid configured in the sense orientation has been shown to bean effective means by which to block the transcription of target genes.For an example of the use of this method to modulate expression ofendogenous genes see, Napoli et al., The Plant Cell 2:279-289 (1990) andU.S. Pat. No. 5,034,323.

[0123] Catalytic RNA molecules or ribozymes can also be used to inhibitexpression of plant genes. It is possible to design ribozymes thatspecifically pair with virtually any target RNA and cleave thephosphodiester backbone at a specific location, thereby functionallyinactivating the target RNA. In carrying out this cleavage, the ribozymeis not itself altered, and is thus capable of recycling and cleavingother molecules, making it a true enzyme. The inclusion of ribozymesequences within antisense RNAs confers RNA-cleaving activity upon them,thereby increasing the activity of the constructs. The design and use oftarget RNA-specific ribozymes is described in Haseloff et al., Nature334:585-591 (1988).

[0124] A variety of cross-linking agents, alkylating agents and radicalgenerating species as pendant groups on polynucleotides of the presentinvention can be used to bind, label, detect, and/or cleave nucleicacids. For example, Vlassov, V. V. et al., Nucleic Acids Res. (1986)14:4065-4076, describe covalent bonding of a single-stranded DNAfragment with alkylating derivatives of nucleotides complementary totarget sequences. A report of similar work by the same group is that byKnorre, D. G. et al., Biochimie (1985) 67:785-789. Iverson and Dervanalso showed sequence-specific cleavage of single-stranded DNA mediatedby incorporation of a modified nucleotide which was capable ofactivating cleavage (J Am Chem Soc (1987) 109:1241-1243). Meyer, R. B.et al., J Am Chem Soc (1989) 111:8517-8519, effect covalent crosslinkingto a target nucleotide using an alkylating agent complementary to thesingle-stranded target nucleotide sequence. A photoactivatedcrosslinking to single-stranded oligonucleotides mediated by psoralenwas disclosed by Lee, B. L. et al., Biochemistry (1988) 27:3197-3203.Use of crosslinking in triple-helix forming probes was also disclosed byHome et al., J Am Chem Soc (1990) 112:2435-2437. Use of N4,N4-ethanocytosine as an alkylating agent to crosslink to single-strandedoligonucleotides has also been described by Webb and Matteucci, J AmChem Soc (1986) 108:2764-2765; Nucleic Acids Res (1986) 14:7661-7674;Feteritz et al., J. Am. Chem. Soc. 113:4000 (1991). Various compounds tobind, detect, label, and/or cleave nucleic acids are known in the art.See, for example, U.S. Pat. Nos. 5,543,507; 5,672,593; 5,484,908;5,256,648; and, 5,681941.

[0125] Proteins

[0126] MutM proteins are involved in repair of oxidative damage to DNAcaused by a variety of environmental agents. The DNA glycosylaseactivity of MutM proteins is essential to effect this repair. All knownmembers of the MutM family have several conserved amino acids,indicating the importance of these amino acid residues to the enzymaticfunction of these proteins. While retaining the active site lysine andseveral conserved amino acids of the MMH family, the polypeptide of thepresent invention also contains the unique features of a putativenuclear localization signal and alternating acidic and basic residues inthe C-terminal region of the protein. These structural motifs have beenhighlighted in Example 4. It is expected that mutM expression willreduce illegitimate recombination in maize thereby increasing frequencyof homologous recombination and transformation in maize, and based onits mismatch repair activity can be used to induce targeted gene changesvia chimeraplasty.

[0127] The isolated proteins of the present invention comprise apolypeptide having at least 10 amino acids encoded by any one of thepolynucleotides of the present invention as discussed more fully, above,or polypeptides which are conservatively modified variants thereof. Theproteins of the present invention or variants thereof can comprise anynumber of contiguous amino acid residues from a polypeptide of thepresent invention, wherein that number is selected from the group ofintegers consisting of from 10 to the number of residues in afull-length polypeptide of the present invention. Optionally, thissubsequence of contiguous amino acids is at least 15, 20, 25, 30, 35, or40 amino acids in length, often at least 50, 60, 70, 80, 90, 100, 125,150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 380, 381, 382, 383, or384 amino acids in length. Further, the number of such subsequences canbe any integer selected from the group consisting of from 1 to 20, suchas 2, 3, 4, or 5.

[0128] The present invention further provides a protein comprising apolypeptide having a specified sequence identity with a polypeptide ofthe present invention. The percentage of sequence identity is an integerselected from the group consisting of from 50 to 99. Exemplary sequenceidentity values include 60%, 65%, 70%, 75%, 80%, 85%, 86%, 87%, 88%,89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% and 99%. Sequenceidentity can be determined using, for example, the GAP or BLASTalgorithms.

[0129] As those of skill will appreciate, the present invention includescatalytically active polypeptides of the present invention (i.e.,enzymes). Catalytically active polypeptides have a specific activity ofat least 20%, 30%, or 40%, 50%, 60%, or 70%, 80%, 90%, or 95% that ofthe native (non-synthetic), endogenous polypeptide. Further, thesubstrate specificity (k_(cat)/K_(m)) is optionally substantiallysimilar to the native (non-synthetic), endogenous polypeptide.Typically, the K_(m) will be at least 30%, 40%, or 50%, that of thenative (non-synthetic), endogenous polypeptide; and up to at least 60%,70%, 80%, or 90%. Methods of assaying and quantifying measures ofenzymatic activity and substrate specificity (k_(cat)/K_(m)) are wellknown to those of skill in the art.

[0130] Generally, the proteins of the present invention will, whenpresented as an immunogen, elicit production of an antibody specificallyreactive to a polypeptide of the present invention. Further, theproteins of the present invention will not bind to antisera raisedagainst a polypeptide of the present invention which has been fullyimmunosorbed with the same polypeptide. Immunoassays for determiningbinding are well known to those of skill in the art, one suchimmunoassay is a competitive immunoassay as discussed, infra. Thus, theproteins of the present invention can be employed as immunogens forconstructing antibodies immunoreactive to a protein of the presentinvention for such exemplary utilities as immunoassays or proteinpurification techniques.

[0131] Expression of Proteins in Host Cells

[0132] Using the nucleic acids of the present invention, one may expressa protein of the present invention in a recombinantly engineered cellsuch as bacteria, yeast, insect, mammalian, or plant cells. The cellsproduce the protein in a non-natural condition (e.g., in quantity,composition, location, and/or time), because they have been geneticallyaltered through human intervention to do so.

[0133] It is expected that those of skill in the art are knowledgeablein the numerous expression systems available for expression of a nucleicacid encoding a protein of the present invention. No attempt to describein detail the various methods known for the expression of proteins inprokaryotes or eukaryotes will be made.

[0134] In brief summary, the expression of isolated nucleic acidsencoding a protein of the present invention will typically be achievedby operably linking, for example, the DNA or cDNA to a promoter (whichis either constitutive or regulatable), followed by incorporation intoan expression vector. The vectors can be suitable for replication andintegration in either prokaryotes or eukaryotes. Typical expressionvectors contain transcription and translation terminators, initiationsequences, and promoters useful for regulation of the expression of theDNA encoding a protein of the present invention. To obtain high levelexpression of a cloned gene, it is desirable to construct expressionvectors which contain, at the minimum, a strong promoter to directtranscription, a ribosome binding site for translational initiation, anda transcription/translation terminator. One of skill would recognizethat modifications can be made to a protein of the present inventionwithout diminishing its biological activity. Some modifications may bemade to facilitate the cloning, expression, or incorporation of thetargeting molecule into a fusion protein. Such modifications are wellknown to those of skill in the art and include, for example, amethionine added at the amino terminus to provide an initiation site, oradditional amino acids (e.g., poly His) placed on either terminus tocreate conveniently located purification sequences. Restriction sites ortermination codons can also be introduced.

[0135] Transfection/Transformation of C IIs

[0136] The method of transformation/transfection is not critical to theinstant invention; various methods of transformation or transfection arecurrently available. As newer methods are available to transform cropsor other host cells they may be directly applied. Accordingly, a widevariety of methods have been developed to insert a DNA sequence into thegenome of a host cell to obtain the transcription and/or translation ofthe sequence to effect phenotypic changes in the organism. Thus, anymethod which provides for effective transformation/transfection may beemployed.

[0137] A. Plant Transformation

[0138] A DNA sequence coding for the desired polypeptide of the presentinvention, for example a cDNA or a genomic sequence encoding a fulllength protein, will be used to construct a recombinant expressioncassette which can be introduced into the desired plant.

[0139] Isolated nucleic acid acids of the present invention can beintroduced into plants according to techniques known in the art.Generally, recombinant expression cassettes as described above andsuitable for transformation of plant cells are prepared. The isolatednucleic acids of the present invention can then be used fortransformation. In this manner, genetically modified plants, plantcells, plant tissue, seed, and the like can be obtained. Transformationprotocols may vary depending on the type of plant cell, i.e. monocot ordicot, targeted for transformation. Suitable methods of transformingplant cells include microinjection (Crossway et al. (1986) Biotechniques4:320-334), electroporation (Riggs et al (1986) Proc. Natl. Acad. Sci.USA 83:5602-5606, Agrobacterium mediated transformation (U.S. Pat. No.5,563,055 and U.S. Pat. No. 5,981,840), direct gene transfer (Paszkowskiet al. (1984) EMBO J. 3:2717-2722), and ballistic particle acceleration(see, for example, Sanford et al. U.S. Pat. No. 4,945,050; Tomes et al.“Direct DNA Transfer into Intact Plant Cells via MicroprojectileBombardment” In Gamborg and Phillips (Eds.) Plant Cell, Tissue and OrganCulture: Fundamental Methods, Springer-Verlag, Berlin (1995); and McCabeet al. (1988) Biotechnology 6:923-926). Also see, Weissinger et al.(1988) Annual Rev. Genet. 22:421-477; Sanford et al. (1987) ParticulateScience and Technology 5:27-37 (onion); Christou et al. (1988) PlantPhisiol. 87:671-674 (soybean); Datta et al. (1990) Biotechnology8:736-740 (rice); Klein et al. (1988) Proc. Natl. Acad. Sci. USA85:4305-4309 (maize); Klein et al. (1988) Biotechnology 6:559-563(maize); Klein et al. (1988) Plant Physiol. 91:440-444 (maize); Fromm etal. (1990) Biotechnology 8:833-839 (maize); Hooykaas-Van Slogteren &Hooykaas (1984) Nature (London) 311:763-764; Bytebier et al. (1987)Proc. Natl. Acad. Sci. USA 84:5345-5349 (Liliaceae); De Wet et al.(1985) In The Experimental Manipulation of Ovule Tissues ed. G. P.Chapman et al. pp. 197-209. Longman, N.Y. (pollen); Kaeppler et al.(1990) Plant Cell Reports 9:415-418; and Kaeppler et al. (1992) Theor.Appl. Genet. 84:560-566 (whisker-mediated transformation); D'Halluin etal. (1992) Plant Cell 4:1495-1505 (electroporation); Li et al. (1993)Plant Cell Reports 12:250-255 and Christou and Ford (1995) Annals ofBotany 75:745-750 (maize via Agrobacterium tumefaciens); all of whichare herein incorporated by reference.

[0140] The cells which have been transformed may be grown into plants inaccordance with conventional ways. See, for example, McCormick et al.(1986) Plant Cell Reports 5:81-84. These plants may then be grown, andeither pollinated with the same transformed strain or different strains,and the resulting hybrid having the desired phenotypic characteristicidentified. Two or more generations may be grown to ensure that thesubject phenotypic characteristic is stably maintained and inherited andthen seeds harvested to ensure the desired phenotype or other propertyhas been achieved.

[0141] B. Transfection of Prokaryotes, Lower Eukaryotes, and AnimalCells

[0142] Animal and lower eukaryotic (e.g., yeast) host cells arecompetent or rendered competent for transfection by various means. Thereare several well-known methods of introducing DNA into animal cells.These include: calcium phosphate precipitation, fusion of the recipientcells with bacterial protoplasts containing the DNA, treatment of therecipient cells with liposomes containing the DNA, DEAE dextran,electroporation, biolistics, and micro-injection of the DNA directlyinto the cells. The transfected cells are cultured by means well knownin the art. Kuchler, R. J., Biochemical Methods in Cell Culture andVirology, Dowden, Hutchinson and Ross, Inc. (1977).

[0143] Synthesis of Proteins

[0144] The proteins of the present invention can be constructed usingnon-cellular synthetic methods. Solid phase synthesis of proteins ofless than about 50 amino acids in length may be accomplished byattaching the C-terminal amino acid of the sequence to an insolublesupport followed by sequential addition of the remaining amino acids inthe sequence. Techniques for solid phase synthesis are described byBarany and Merrifield, Solid-Phase Peptide Synthesis, pp. 3-284 in ThePeptides: Analysis, Synthesis, Biology. Vol. 2: Special Methods inPeptide Synthesis, Part A.; Merrifield et al., J. Am. Chem. Soc. 85:2149-2156 (1963), and Stewart et al., Solid Phase Peptide Synthesis, 2nded., Pierce Chem. Co., Rockford, Ill. (1984). Proteins of greater lengthmay be synthesized by condensation of the amino and carboxy termini ofshorter fragments. Methods of forming peptide bonds by activation of acarboxy terminal end (e.g., by the use of the coupling reagentN,N′-dicyclohexylcarbodiimide) are known to those of skill.

[0145] Purification of Proteins

[0146] The proteins of the present invention may be purified by standardtechniques well known to those of skill in the art. Recombinantlyproduced proteins of the present invention can be directly expressed orexpressed as a fusion protein. The recombinant protein can be purifiedby a combination of cell lysis (e.g., sonication, French press) andaffinity chromatography. For fusion products, subsequent digestion ofthe fusion protein with an appropriate proteolytic enzyme releases thedesired recombinant protein.

[0147] The proteins of this invention, recombinant or synthetic, may bepurified to substantial purity by standard techniques well known in theart, including detergent solubilization, selective precipitation withsuch substances as ammonium sulfate, column chromatography,immunopurification methods, and others. See, for instance, R. Scopes,Protein Purification: Principles and Practice, Springer-Verlag: New York(1982); Deutscher, Guide to Protein Purification, Academic Press (1990).For example, antibodies may be raised to the proteins as describedherein. Purification from E. coli can be achieved following proceduresdescribed in U.S. Pat. No. 4,511,503. The protein may then be isolatedfrom cells expressing the protein and further purified by standardprotein chemistry techniques as described herein. Detection of theexpressed protein is achieved by methods known in the art and include,for example, radioimmunoassays, Western blotting techniques orimmunoprecipitation.

[0148] Transgenic Plant Regeneration

[0149] Plants cells transformed with a plant expression vector can beregenerated, e.g., from single cells, callus tissue or leaf discsaccording to standard plant tissue culture techniques. It is well knownin the art that various cells, tissues, and organs from almost any plantcan be successfully cultured to regenerate an entire plant. Plantregeneration from cultured protoplasts is described in Evans et al.,Protoplasts Isolation and Culture, Handbook of Plant Cell Culture,Macmillilan Publishing Company, New York, pp. 124-176 (1983); andBinding, Regeneration of Plants, Plant Protoplasts, CRC Press, BocaRaton, pp. 21-73 (1985).

[0150] The regeneration of plants containing the foreign gene introducedby Agrobacterium from leaf explants can be achieved as described byHorsch et al., Science 227:1229-1231 (1985). In this procedure,transformants are grown in the presence of a selection agent and in amedium that induces the regeneration of shoots in the plant speciesbeing transformed as described by Fraley et al., Proc. Natl. Acad. Sci.USA 80:4803 (1983). This procedure typically produces shoots within twoto four weeks and these transformant shoots are then transferred to anappropriate root-inducing medium containing the selective agent and anantibiotic to prevent bacterial growth. Transgenic plants of the presentinvention may be fertile or sterile.

[0151] Regeneration can also be obtained from plant callus, explants,organs, or parts thereof. Such regeneration techniques are describedgenerally in Klee et al., Ann. Rev. of Plant Phys. 38:467-486 (1987).The regeneration of plants from either single plant protoplasts orvarious explants is well known in the art. See, for example, Methods forPlant Molecular Biology, A. Weissbach and H. Weissbach, eds., AcademicPress, Inc., San Diego, Calif. (1988). This regeneration and growthprocess includes the steps of selection of transformant cells andshoots, rooting the transformant shoots and growth of the plantlets insoil. For maize cell culture and regeneration see generally, The MaizeHandbook, Freeling and Walbot, Eds., Springer, N.Y. (1994); Corn andCorn Improvement, 3^(rd) edition, Sprague and Dudley Eds., AmericanSociety of Agronomy, Madison, Wis. (1988).

[0152] One of skill will recognize that after the recombinant expressioncassette is stably incorporated in transgenic plants and confirmed to beoperable, it can be introduced into other plants by sexual crossing. Anyof a number of standard breeding techniques can be used, depending uponthe species to be crossed.

[0153] In vegetatively propagated crops, mature transgenic plants can bepropagated by the taking of cuttings or by tissue culture techniques toproduce multiple identical plants. Selection of desirable transgenics ismade and new varieties are obtained and propagated vegetatively forcommercial use. In seed propagated crops, mature transgenic plants canbe self crossed to produce a homozygous inbred plant. The inbred plantproduces seed containing the newly introduced heterologous nucleic acid.These seeds can be grown to produce plants that would produce theselected phenotype.

[0154] Parts obtained from the regenerated plant, such as flowers,seeds, leaves, branches, fruit, and the like are included in theinvention, provided that these parts comprise cells comprising theisolated nucleic acid of the present invention. Progeny and variants,and mutants of the regenerated plants are also included within the scopeof the invention, provided that these parts comprise the introducednucleic acid sequences.

[0155] Transgenic plants expressing the selectable marker can bescreened for transmission of the nucleic acid of the present inventionby, for example, standard immunoblot and DNA detection techniques.Transgenic lines are also typically evaluated on levels of expression ofthe heterologous nucleic acid. Expression at the RNA level can bedetermined initially to identify and quantitate expression-positiveplants. Standard techniques for RNA analysis can be employed and includePCR amplification assays using oligonucleotide primers designed toamplify only the heterologous RNA templates and solution hybridizationassays using heterologous nucleic acid-specific probes. The RNA-positiveplants can then analyzed for protein expression by Western immunoblotanalysis using the specifically reactive antibodies of the presentinvention. In addition, in situ hybridization and immunocytochemistryaccording to standard protocols can be done using heterologous nucleicacid specific polynucleotide probes and antibodies, respectively, tolocalize sites of expression within transgenic tissue. Generally, anumber of transgenic lines are usually screened for the incorporatednucleic acid to identify and select plants with the most appropriateexpression profiles.

[0156] Transgenic plants of the present invention can be homozygous forthe added heterologous nucleic acid; i.e., a transgenic plant thatcontains two added nucleic acid sequences, one gene at the same locus oneach chromosome of a chromosome pair. A homozygous transgenic plant canbe obtained by sexually mating (selfing) a heterozygous transgenic plantthat contains a single added heterologous nucleic acid, germinating someof the seed produced and analyzing the resulting plants produced foraltered expression of a polynucleotide of the present invention relativeto a control plant (i.e., native, non-transgenic). Back-crossing to aparental plant and out-crossing with a non-transgenic plant are alsocontemplated.

[0157] Modulating Polypeptide Levels and/or Composition

[0158] The present invention further provides a method for modulating(i.e., increasing or decreasing) the concentration or ratio of thepolypeptides of the present invention in a plant or part thereof.Modulation can be effected by increasing or decreasing the concentrationand/or the ratio of the polypeptides of the present invention in aplant. The method comprises introducing into a plant cell a recombinantexpression cassette comprising a polynucleotide of the present inventionas described above to obtain a transformed plant cell, culturing thetransformed plant cell under plant cell growing conditions, and inducingor repressing expression of a polynucleotide of the present invention inthe plant for a time sufficient to modulate concentration and/or theratios of the polypeptides in the plant or plant part.

[0159] In some embodiments, the concentration and/or ratios ofpolypeptides of the present invention in a plant may be modulated byaltering, in vivo or in vitro, the promoter of a gene to up- ordown-regulate gene expression. In some embodiments, the coding regionsof native genes of the present invention can be altered viasubstitution, addition, insertion, or deletion to decrease activity ofthe encoded enzyme. See, e.g., Kmiec, U.S. Pat. No. 5,565,350; Zarlinget al., PCT/US93/03868. And in some embodiments, an isolated nucleicacid (e.g., a vector) comprising a promoter sequence is transfected intoa plant cell. Subsequently, a plant cell comprising the promoteroperably linked to a polynucleotide of the present invention is selectedfor by means known to those of skill in the art such as, but not limitedto, Southern blot, DNA sequencing, or PCR analysis using primersspecific to the promoter and to the gene and detecting ampliconsproduced therefrom. A plant or plant part altered or modified by theforegoing embodiments is grown under plant forming conditions for a timesufficient to modulate the concentration and/or ratios of polypeptidesof the present invention in the plant. Plant forming conditions are wellknown in the art and discussed briefly, supra.

[0160] In general, concentration or the ratios of the polypeptides isincreased or decreased by at least 5%, 10%, 20%, 30%, 40%, 50%, 60%,70%, 80%, or 90% relative to a native control plant, plant part, or celllacking the aforementioned recombinant expression cassette. Modulationin the present invention may occur during and/or subsequent to growth ofthe plant to the desired stage of development. Modulating nucleic acidexpression temporally and/or in particular tissues can be controlled byemploying the appropriate promoter operably linked to a polynucleotideof the present invention in, for example, sense or antisense orientationas discussed in greater detail, supra. Induction of expression of apolynucleotide of the present invention can also be controlled byexogenous administration of an effective amount of inducing compound.Inducible promoters and inducing compounds which activate expressionfrom these promoters are well known in the art. In some embodiments, thepolypeptides of the present invention are modulated in monocots,particularly maize.

[0161] UTRs and Codon Preference

[0162] In general, translational efficiency has been found to beregulated by specific sequence elements in the 5′ non-coding oruntranslated region (5′ UTR) of the RNA. Positive sequence motifsinclude translational initiation consensus sequences (Kozak, NucleicAcids Res. 15:8125 (1987)) and the 7-methylguanosine cap structure(Drummond et al., Nucleic Acids Res. 13:7375 (1985)). Negative elementsinclude stable intramolecular 5′ UTR stem-loop structures (Muesing etal., Cell 48:691 (1987)) and AUG sequences or short open reading framespreceded by an appropriate AUG in the 5′ UTR (Kozak, supra, Rao et al.,Mol. and Cell. Biol. 8:284 (1988)). Accordingly, the present inventionprovides 5′ and/or 3′ untranslated regions for modulation of translationof heterologous coding sequences.

[0163] Further, the polypeptide-encoding segments of the polynucleotidesof the present invention can be modified to alter codon usage. Alteredcodon usage can be employed to alter translational efficiency and/or tooptimize the coding sequence for expression in a desired host such as tooptimize the codon usage in a heterologous sequence for expression inmaize. Codon usage in the coding regions of the polynucleotides of thepresent invention can be analyzed statistically using commerciallyavailable software packages such as “Codon Preference” available fromthe University of Wisconsin Genetics Computer Group (see Devereaux etal., Nucleic Acids Res. 12:387-395 (1984)) or MacVector 4.1 (EastmanKodak Co., New Haven, Conn.). Thus, the present invention provides acodon usage frequency characteristic of the coding region of at leastone of the polynucleotides of the present invention. The number ofpolynucleotides that can be used to determine a codon usage frequencycan be any integer from 1 to the number of polynucleotides of thepresent invention as provided herein. Optionally, the polynucleotideswill be full-length sequences. An exemplary number of sequences forstatistical analysis can be at least 1, 5, 10, 20, 50, or 100.

[0164] Sequence Shuffling

[0165] The present invention provides methods for sequence shufflingusing polynucleotides of the present invention, and compositionsresulting therefrom. Sequence shuffling is described in PCT publicationNo. WO 97/20078. See also, Zhang, J.- H., et al. Proc. Natl. Acad. Sci.USA 94:4504-4509 (1997). Generally, sequence shuffling provides a meansfor generating libraries of polynucleotides having a desiredcharacteristic which can be selected or screened for. Libraries ofrecombinant polynucleotides are generated from a population of relatedsequence polynucleotides which comprise sequence regions which havesubstantial sequence identity and can be homologously recombined invitro or in vivo. The population of sequence-recombined polynucleotidescomprises a subpopulation of polynucleotides which possess desired oradvantageous characteristics and which can be selected by a suitableselection or screening method. The characteristics can be any propertyor attribute capable of being selected for or detected in a screeningsystem, and may include properties of: an encoded protein, atranscriptional element, a sequence controlling transcription, RNAprocessing, RNA stability, chromatin conformation, translation, or otherexpression property of a gene or transgene, a replicative element, aprotein-binding element, or the like, such as any feature which confersa selectable or detectable property. In some embodiments, the selectedcharacteristic will be a decreased K_(m) and/or increased K_(cat) overthe wild-type protein as provided herein. In other embodiments, aprotein or polynucleotide generated from sequence shuffling will have aligand binding affinity greater than the non-shuffled wild-typepolynucleotide. The increase in such properties can be at least 110%,120%, 130%, 140% or at least 150% of the wild-type value.

[0166] Generic and Consensus Sequences

[0167] Polynucleotides and polypeptides of the present invention furtherinclude those having: (a) a generic sequence of at least two homologouspolynucleotides or polypeptides, respectively, of the present invention;and, (b) a consensus sequence of at least three homologouspolynucleotides or polypeptides, respectively, of the present invention.The generic sequence of the present invention comprises each species ofpolypeptide or polynucleotide embraced by the generic polypeptide orpolynucleotide sequence, respectively. The individual speciesencompassed by a polynucleotide having an amino acid or nucleic acidconsensus sequence can be used to generate antibodies or produce nucleicacid probes or primers to screen for homologs in other species, genera,families, orders, classes, phyla, or kingdoms. For example, apolynucleotide having a consensus sequence from a gene family of Zeamays can be used to generate antibody or nucleic acid probes or primersto other Gramineae species such as wheat, rice, or sorghum.Alternatively, a polynucleotide having a consensus sequence generatedfrom orthologous genes can be used to identify or isolate orthologs ofother taxa. Typically, a polynucleotide having a consensus sequence willbe at least 9, 10, 15, 20, 25, 30, or 40 amino acids in length, or 20,30, 40, 50, 100, or 150 nucleotides in length. As those of skill in theart are aware, a conservative amino acid substitution can be used foramino acids which differ amongst aligned sequence but are from the sameconservative substitution group as discussed above. Optionally, no morethan 1 or 2 conservative amino acids are substituted for each 10 aminoacid length of consensus sequence.

[0168] Similar sequences used for generation of a consensus or genericsequence include any number and combination of allelic variants of thesame gene, orthologous, or paralogous sequences as provided herein.Optionally, similar sequences used in generating a consensus or genericsequence are identified using the BLAST algorithm's smallest sumprobability (P(N)). Various suppliers of sequence-analysis software arelisted in chapter 7 of Current Protocols in Molecular Biology, F. M.Ausubel et al., Eds., Current Protocols, a joint venture between GreenePublishing Associates, Inc. and John Wiley & Sons, Inc. (Supplement 30).A polynucleotide sequence is considered similar to a reference sequenceif the smallest sum probability in a comparison of the test nucleic acidto the reference nucleic acid is less than about 0.1, more preferablyless than about 0.01, or 0.001, and most preferably less than about0.0001, or 0.00001. Similar polynucleotides can be aligned and aconsensus or generic sequence generated using multiple sequencealignment software available from a number of commercial suppliers suchas the Genetics Computer Group's (GCG®, Accelrys Inc., San Diego,Calif., USA) PILEUP software, Vector NTI's (North Bethesda, Md.) ALIGNX,or Genecode's (Ann Arbor, Mich.) SEQUENCHER. Conveniently, defaultparameters of such software can be used to generate consensus or genericsequences.

[0169] Computer Applications

[0170] The present invention provides machines, data structures, andprocesses for modeling or analyzing the polynucleotides and polypeptidesof the present invention.

[0171] A. Machines and Data Structures

[0172] The present invention provides a machine having a memorycomprising data representing a sequence of a polynucleotide orpolypeptide of the present invention. The machine of the presentinvention is typically a digital computer. The memory of such a machineincludes, but is not limited to, ROM, or RAM, or computer readable mediasuch as, but not limited to, magnetic media such as computer disks orhard drives, or media such as CD-ROM. Thus, the present invention alsoprovides a data structure comprising a sequence of a polynucleotide ofthe present invention embodied in a computer readable medium. As thoseof skill in the art will be aware, the form of memory of a machine ofthe present invention or the particular embodiment of the computerreadable medium is not a critical element of the invention and can takea variety of forms.

[0173] B. Homology Searches

[0174] The present invention provides a process for identifying acandidate homologue (i.e., an ortholog or paralog) of a polynucleotideor polypeptide of the present invention. A candidate homologue hasstatistically significant probability of having the same biologicalfunction (e.g., catalyzes the same reaction, binds to homologousproteins/nucleic acids) as the reference sequence to which it'scompared. Accordingly, the polynucleotides and polypeptides of thepresent invention have utility in identifying homologs in animals orother plant species, particularly those in the family Gramineae such as,but not limited to, sorghum, wheat, or rice.

[0175] The process of the present invention comprises obtaining datarepresenting a polynucleotide or polypeptide test sequence. Testsequences are generally at least 25 amino acids in length or at least 50nucleotides in length. Optionally, the test sequence can be at least 50,100, 150, 200, 250, 300, or 400 amino acids in length. A testpolynucleotide can be at least 50, 100, 200, 300, 400, or 500nucleotides in length. Often the test sequence will be a full-lengthsequence. Test sequences can be obtained from a nucleic acid of ananimal or plant. Optionally, the test sequence is obtained from a plantspecies other than maize whose function is uncertain but will becompared to the test sequence to determine sequence similarity orsequence identity; for example, such plant species can be of the familyGramineae, such as wheat, rice, or sorghum. The test sequence data areentered into a machine, typically a computer, having a memory thatcontains data representing a reference sequence. The reference sequencecan be the sequence of a polypeptide or a polynucleotide of the presentinvention and is often at least 25 amino acids or 100 nucleotides inlength. As those of skill in the art are aware, the greater the sequenceidentity/similarity between a reference sequence of known function and atest sequence, the greater the probability that the test sequence willhave the same or similar function as the reference sequence.

[0176] The machine further comprises a sequence comparison means fordetermining the sequence identity or similarity between the testsequence and the reference sequence. Exemplary sequence comparison meansare provided for in sequence analysis software discussed previously.Optionally, sequence comparison is established using the BLAST or GAPsuite of programs.

[0177] The results of the comparison between the test and referencesequences can be displayed. Generally, a smallest sum probability value(P(N)) of less than 0.1, or alternatively, less than 0.01, 0.001,0.0001, or 0.00001 using the BLAST 2.0 suite of algorithms under defaultparameters identifies the test sequence as a candidate homologue (i.e.,an allele, ortholog, or paralog) of the reference sequence. A nucleicacid comprising a polynucleotide having the sequence of the candidatehomologue can be constructed using well known library isolation,cloning, or in vitro synthetic chemistry techniques (e.g.,phosphoramidite) such as those described herein. In additionalembodiments, a nucleic acid comprising a polynucleotide having asequence represented by the candidate homologue is introduced into aplant; typically, these polynucleotides are operably linked to apromoter. Confirmation of the function of the candidate homologue can beestablished by operably linking the candidate homolog nucleic acid to,for example, an inducible promoter, or by expressing the antisensetranscript, and analyzing the plant for changes in phenotype consistentwith the presumed function of the candidate homolog. Optionally, theplant into which these nucleic acids are introduced is a monocot such asfrom the family Gramineae. Exemplary plants include maize, sorghum,wheat, rice, canola, alfalfa, cotton, sunflower, safflower, millet,barley and soybean.

[0178] C. Computer Modeling

[0179] The present invention provides a process of modeling/analyzingdata representative of the sequence a polynucleotide or polypeptide ofthe present invention. The process comprises entering sequence data of apolynucleotide or polypeptide of the present invention into a machine,manipulating the data to model or analyze the structure or activity ofthe polynucleotide or polypeptide, and displaying the results of themodeling or analysis. A variety of modeling and analytic tools are wellknown in the art and available from such commercial vendors as GeneticsComputer Group (GCG®, Accelrys Inc., San Diego, Calif., USA). Includedamongst the modeling/analysis tools are methods to: 1) recognizeoverlapping sequences (e.g., from a sequencing project) with apolynucleotide of the present invention and create an alignment called a“contig”; 2) identify restriction enzyme sites of a polynucleotide ofthe present invention; 3) identify the products of a T1 ribonucleasedigestion of a polynucleotide of the present invention; 4) identify PCRprimers with minimal self-complementarity; 5) compare two protein ornucleic acid sequences and identifying points of similarity ordissimilarity between them; 6) compute pairwise distances betweensequences in an alignment, reconstruct phylogentic trees using distancemethods, and calculate the degree of divergence of two protein codingregions; 7) identify patterns such as coding regions, terminators,repeats, and other consensus patterns in polynucleotides of the presentinvention; 8) identify RNA secondary structure; 9) identify sequencemotifs, isoelectric point, secondary structure, hydrophobicity, andantigenicity in polypeptides of the present invention; and, 10)translate polynucleotides of the present invention and backtranslatepolypeptides of the present invention.

[0180] Detection of Nucleic Acids

[0181] The present invention further provides methods for detecting apolynucleotide of the present invention in a nucleic acid samplesuspected of containing a polynucleotide of the present invention, suchas a plant cell lysate, particularly a lysate of maize. In someembodiments, a gene of the present invention or portion thereof can beamplified prior to the step of contacting the nucleic acid sample with apolynucleotide of the present invention. The nucleic acid sample iscontacted with the polynucleotide to form a hybridization complex. Thepolynucleotide hybridizes under stringent conditions to a gene encodinga polypeptide of the present invention. Formation of the hybridizationcomplex is used to detect a gene encoding a polypeptide of the presentinvention in the nucleic acid sample. Those of skill will appreciatethat an isolated nucleic acid comprising a polynucleotide of the presentinvention should lack cross-hybridizing sequences in common withnon-target genes that would yield a false positive result. Detection ofthe hybridization complex can be achieved using any number of well knownmethods. For example, the nucleic acid sample, or a portion thereof, maybe assayed by hybridization formats including but not limited to,solution phase, solid phase, mixed phase, or in situ hybridizationassays.

[0182] Detectable labels suitable for use in the present inventioninclude any composition detectable by spectroscopic, radioisotopic,photochemical, biochemical, immunochemical, electrical, optical orchemical means. Useful labels in the present invention include biotinfor staining with labeled streptavidin conjugate, magnetic beads,fluorescent dyes, radiolabels, enzymes, and colorimetric labels. Otherlabels include ligands which bind to antibodies labeled withfluorophores, chemiluminescent agents, and enzymes. Labeling the nucleicacids of the present invention is readily achieved such as by the use oflabeled PCR primers.

[0183] Although the present invention has been described in some detailby way of illustration and example for purposes of clarity ofunderstanding, it will be obvious that certain changes and modificationsmay be practiced within the scope of the appended claims.

EXAMPLE 1

[0184] This example describes the construction of the cDNA libraries.

[0185] Total RNA Isolation

[0186] The RNA for SEQ ID NO: 1 was isolated from salicylic acidinfiltrated V3/N4 leaf tissue (minus the midrib) from maize line B73.Tissue was collected 4 hours, 21 hours, and 7 days after infiltrationand pooled. Total RNA was isolated from maize tissues with TRIZOLReagent (Life Technology Inc. Gaithersburg, Md.) using a modification ofthe guanidine isothiocyanate/acid-phenol procedure described byChomczynski and Sacchi (Chomczynski, P. and Sacchi, N. Anal. Biochem.162:156, 1987). In brief, plant tissue samples were pulverized in liquidnitrogen before the addition of the TRIZOL Reagent, and then werefurther homogenized with a mortar and pestle. Addition of chloroformfollowed by centrifugation was conducted for separation of an aqueousphase and an organic phase. The total RNA was recovered by precipitationwith isopropyl alcohol from the aqueous phase.

[0187] Poly(A)+ RNA Isolation

[0188] The selection of poly(A)+ RNA from total RNA was performed usingthe POLYATTRACT system (mRNA isolation system, Promega Corporation.Madison, Wis.). In brief, biotinylated oligo(dT) primers were used tohybridize to the 3′ poly(A) tails on mRNA. The hybrids were capturedusing streptavidin coupled to paramagnetic particles and a magneticseparation stand. The mRNA was washed at high stringency conditions andeluted by RNase-free deionized water.

[0189] cDNA Library Construction

[0190] cDNA synthesis was performed and unidirectional cDNA librarieswere constructed using the SUPERSCRIPT Plasmid System (Life TechnologyInc. Gaithersburg, Md.). The first strand of cDNA was synthesized bypriming an oligo(dT) primer containing a Not I site. The reaction wascatalyzed by SUPERSCRIPT Reverse Transcriptase II at 45° C. The secondstrand of cDNA was labeled with alpha-³²P-dCTP and a portion of thereaction was analyzed by agarose gel electrophoresis to determine cDNAsizes. cDNA molecules smaller than 500 base pairs and unligated adapterswere removed by Sephacryl-S400 chromatography. The selected cDNAmolecules were ligated into pSPORT1 vector in between of Not I and Sal Isites.

EXAMPLE 2

[0191] This example describes cDNA sequencing and library subtraction.

[0192] Sequencing Template Preparation

[0193] Individual colonies were picked and DNA was prepared either byPCR with M13 forward primers and M13 reverse primers, or by plasmidisolation. All the cDNA clones were sequenced using M13 reverse primers.

[0194] Q-bot Subtraction Procedure

[0195] cDNA libraries subjected to the subtraction procedure were platedout on 22×22 cm² agar plate at density of about 3,000 colonies perplate. The plates were incubated in a 37° C. incubator for 12-24 hours.Colonies were picked into 384-well plates by a robot colony picker,Q-bot (GENETIX Limited). These plates were incubated overnight at 37° C.Once sufficient colonies were picked, they were pinned onto 22×22 cm²nylon membranes using Q-bot. Each membrane contained 9,216 colonies or36,864 colonies. These membranes were placed onto agar plate withappropriate antibiotic. The plates were incubated at 37° C. forovernight. After colonies were recovered on the second day, thesefilters were placed on filter paper prewetted with denaturing solutionfor four minutes, then were incubated on top of a boiling water bath foradditional four minutes. The filters were then placed on filter paperprewetted with neutralizing solution for four minutes. After excesssolution was removed by placing the filters on dry filter papers for oneminute, the colony side of the filters were place into Proteinase Ksolution, incubated at 37° C. for 40-50 minutes. The filters were placedon dry filter papers to dry overnight. DNA was then cross-linked tonylon membrane by UV light treatment.

[0196] Colony hybridization was conducted as described by Sambrook, J.,Fritsch, E. F. and Maniatis, T., (in Molecular Cloning: A LaboratoryManual, 2^(nd) Edition). The following probes were used in colonyhybridization:

[0197] 1. First strand cDNA from the same tissue as the library was madefrom to remove the most redundant clones.

[0198] 2. 48-192 most redundant cDNA clones from the same library basedon previous sequencing data.

[0199] 3. 192 most redundant cDNA clones in the entire maize sequencedatabase.

[0200] 4. A Sal-A20 oligo nucleotide: TCG ACC CAC GCG TCC GAA AAA AAAAAA AAA AAA AAA, listed in SEQ ID NO. 3, removes clones containing apoly A tail but no cDNA.

[0201] 5. cDNA clones derived from rRNA.

[0202] The image of the autoradiography was scanned into computer andthe signal intensity and cold colony addresses of each colony wasanalyzed. Re-arraying of cold-colonies from 384 well plates to 96 wellplates was conducted using Q-bot.

EXAMPLE 3

[0203] This example describes identification of the gene from a computerhomology search. Gene identities were determined by conducting BLAST(Basic Local Alignment Search Tool; Altschul, S. F., et al., (1990) J.Mol. Biol. 215:403-410) searches under default parameters for similarityto sequences contained in the BLAST “nr” database (comprising allnon-redundant GenBank CDS translations, sequences derived from the3-dimensional structure Brookhaven Protein Data Bank, the last majorrelease of the SWISS-PROT protein sequence database, EMBL, and DDBJdatabases). The cDNA sequences were analyzed for similarity to allpublicly available DNA sequences contained in the “nr” database usingthe BLASTN algorithm. The DNA sequences were translated in all readingframes and compared for similarity to all publicly available proteinsequences contained in the “nr” database using the BLASTX algorithm(Gish, W. and States, D. J. Nature Genetics 3:266-272 (1993)) providedby the NCBI. In some cases, the sequencing data from two or more clonescontaining overlapping segments of DNA were used to construct contiguousDNA sequences.

EXAMPLE 4

[0204] This example shows the relevant features of the maize mutMpolypeptide.

EXAMPLE 5

[0205] This example provides methods of plant transformation andregeneration using the polynucleotides of the present invention, as wellas a method to determine their effect on transformation efficiency.

[0206] A. Transformation by Particle Bombardment

[0207] Transformation of a mutM construct along with a marker-expressioncassette (for example, UBI::moPAT-GFPm::pinII) into genotype Hi-IIfollows a well-established bombardment transformation protocol used forintroducing DNA into the scutellum of immature maize embryos (Songstad,D. D. et al., In Vitro Cell Dev. Biol. Plant 32:179-183, 1996). It isnoted that any suitable method of transformation can be used, such asAgrobacterium-mediated transformation and many other methods. To preparesuitable target tissue for transformation, ears are surface sterilizedin 50% Chlorox bleach plus 0.5% Micro detergent for 20 minutes, andrinsed two times with sterile water. The immature embryos (approximately1-1.5 mm in length) are excised and placed embryo axis side down(scutellum side up), 25 embryos per plate. These are cultured ontomedium containing N6 salts, Erikkson's vitamins, 0.69 g/l proline, 2mg/l 2,4-D and 3% sucrose. After 4-5 days of incubation in the dark at28° C., embryos are removed from the first medium and cultured ontosimilar medium containing 12% sucrose. Embryos are allowed to acclimateto this medium for 3 h prior to transformation. The scutellar surface ofthe immature embryos is targeted using particle bombardment. Embryos aretransformed using the PDS-1000 Helium Gun from Bio-Rad at one shot persample using 650PSI rupture disks. DNA delivered per shot averagesapproximately 0.1667 μg. Following bombardment, all embryos aremaintained on standard maize culture medium (N6 salts, Erikkson'svitamins, 0.69 g/l proline, 2 mg/l 2,4-D, 3% sucrose) for 2-3 days andthen transferred to N6-based medium containing 3 mg/L Bialaphos®. Platesare maintained at 28° C. in the dark and are observed for colonyrecovery with transfers to fresh medium every two to three weeks. Afterapproximately 10 weeks of selection, selection-resistant GFP positivecallus clones can be sampled for presence of mutM mRNA and/or protein.Positive lines are transferred to 288J medium, an MS-based medium withlower sucrose and hormone levels, to initiate plant regeneration.Following somatic embryo maturation (2-4 weeks), well-developed somaticembryos are transferred to medium for germination and transferred to thelighted culture room. Approximately 7-10 days later, developingplantlets are transferred to medium in tubes for 7-10 days untilplantlets are well established. Plants are then transferred to insertsin flats (equivalent to 2.5″ pot) containing potting soil and grown for1 week in a growth chamber, subsequently grown an additional 1-2 weeksin the greenhouse, then transferred to Classic™ 600 pots (1.6 gallon)and grown to maturity. Plants are monitored for expression of mutM mRNAand/or protein. Recovered colonies and plants are scored based on GFPvisual expression, leaf painting sensitivity to a 1% application ofIgnite® herbicide, and molecular characterization via PCR and Southernanalysis.

[0208] B. Transformation by Agrobacterium

[0209] Transformation of a mutM cassette along withUBI::moPAT˜moGFP::pinII into a maize genotype such as Hi-II (or inbredssuch as Pioneer Hi-Bred International, Inc. proprietary inbreds N46 andP38) is also done using the Agrobacterium mediated DNA delivery method,as described by U.S. Pat. No. 5,981,840 with the followingmodifications. Again, it is noted that any suitable method oftransformation can be used, such as particle-mediated transformation, aswell as many other methods. Agrobacterium cultures are grown to logphase in liquid minimal-A medium containing 100 μM spectinomycin.Embryos are immersed in a log phase suspension of Agrobacteria adjustedto obtain an effective concentration of 5×108 cfu/ml. Embryos areinfected for 5 minutes and then co-cultured on culture medium containingacetosyringone for 7 days at 20° C. in the dark. After 7 days, theembryos are transferred to standard culture medium (MS salts with N6macronutrients, 1 mg/L 2,4-D,1 mg/L Dicamba, 20 g/L sucrose, 0.6 g/Lglucose, 1 mg/L silver nitrate, and 100 mg/L carbenicillin) with 3 mg/LBialaphos® as the selective agent. Plates are maintained at 28° C. inthe dark and are observed for colony recovery with transfers to freshmedium every two to three weeks. Positive lines are transferred to anMS-based medium with lower sucrose and hormone levels, to initiate plantregeneration. Following somatic embryo maturation (2-4 weeks),well-developed somatic embryos are transferred to medium for germinationand transferred to the lighted culture room. Approximately 7-10 dayslater, developed plantlets are transferred to medium in tubes for 7-10days until plantlets are well established. Plants are then transferredto inserts in flats (equivalent to 2.5″ pot) containing potting soil andgrown for 1 week in a growth chamber, subsequently grown an additional1-2 weeks in the greenhouse, then transferred to Classic™ 600 pots (1.6gallon) and grown to maturity. Recovered colonies and plants are scoredbased on GFP visual expression, leaf painting sensitivity to a 1%application of Ignite® herbicide, and molecular characterization via PCRand Southern analysis.

[0210] C. Determining Changes in Transformation Efficiency

[0211] It is expected that transformation frequency will be improved byintroducing mutM using Agrobacterium or particle bombardment. Plasmidsdescribed in this example are used to transform Hi-II immature embryosusing particle delivery or the Agrobacterium. The effect of mutM can bemeasured by comparing the transformation efficiency of mutM constructsco-transformed with GFP constructs to the transformation efficiency ofcontrol GFP constructs only. Source embryos from individual ears will besplit between the two test groups in order to minimize any effect ontransformation efficiency due differences in starting material.Bialaphos resistant GFP+ colonies are counted using a GFP microscope andtransformation frequencies are determined (percentage of initial targetembryos from which at least one GFP-expressing, bialaphos-resistantmulticellular transformed event grows). In both particle gun experimentsand Agrobacterium experiments, transformation frequencies are expectedto be greatly increased in the mutM treatment group. A similar strategycan be used to determine changes in the frequency of homologousrecombination or the frequency of targeted gene modifications(chimeraplasty) using mutM.

[0212] D. Transient Expression of the mutM Polynucleotide Product

[0213] It may be desirable to transiently express mutM in order toinduce chimeraplasty for another gene of interest or increase to thetransformation efficiency or homologous recombination of anotherpolynucleotide of interest without incorporating the mutM polynucleotideinto the genome of the target cell. This can be done by delivering mutM5′ capped polyadenylated RNA or expression cassettes containing mutMDNA. These molecules can be delivered using a biolistics particle gun.For example 5′ capped polyadenylated mutM RNA can easily be made invitro using Ambion's mMessage mMachine kit. Following the procedureoutline above RNA is co-delivered along with DNA containing anagronomically useful expression cassette. The cells receiving the RNAwill transiently express mutM which will facilitate the integration ofthe polynucleotide or modification of interest. Plants regenerated fromthese embryos can then be screened for the presence of the gene ormodification of interest.

[0214] The above examples are provided to illustrate the invention butnot to limit its scope. Other variants of the invention will be readilyapparent to one of ordinary skill in the art and are encompassed by theappended claims. All publications, patents, patent applications, andcomputer programs cited herein are hereby incorporated by reference.

1 3 1 1586 DNA Zea mays CDS (73)...(1224) 1 acccacgcgt ccgcggacgcgtggggacga gtgagcgaga ggaacgggga gagggaagtt 60 aaacgcgcgg cg atg ccg gagctg ccg gag gtg gag gcg gcg cgt cgg gcg 111 Met Pro Glu Leu Pro Glu ValGlu Ala Ala Arg Arg Ala 1 5 10 ctg cag gcg cac tgc gtg ggg agg cgc atcgcg cgc tgc gcc gtg gcg 159 Leu Gln Ala His Cys Val Gly Arg Arg Ile AlaArg Cys Ala Val Ala 15 20 25 gac gac gcc aag gtg gtc gtt gcc gcc gcc ggccgc gcg gcc ttc gag 207 Asp Asp Ala Lys Val Val Val Ala Ala Ala Gly ArgAla Ala Phe Glu 30 35 40 45 cgg gcc atg gtc ggc cgg acc atc gtc gcc gcgcgc cgg agg ggc aag 255 Arg Ala Met Val Gly Arg Thr Ile Val Ala Ala ArgArg Arg Gly Lys 50 55 60 aac ctc tgg ctc cag ctc gac gcg ccg ccc ttc ccgtcc ttc cag ttc 303 Asn Leu Trp Leu Gln Leu Asp Ala Pro Pro Phe Pro SerPhe Gln Phe 65 70 75 ggg atg gca ggc gcg ata tat atc aaa ggc att cct gtgacg aat tat 351 Gly Met Ala Gly Ala Ile Tyr Ile Lys Gly Ile Pro Val ThrAsn Tyr 80 85 90 aag aga tcc gtt gtt aat tcc gaa gag gag tgg ccc tcc aaacac tct 399 Lys Arg Ser Val Val Asn Ser Glu Glu Glu Trp Pro Ser Lys HisSer 95 100 105 aaa ttc ttt gct gag ctt gat gat ggt ttg gag ttc tct ttcact gat 447 Lys Phe Phe Ala Glu Leu Asp Asp Gly Leu Glu Phe Ser Phe ThrAsp 110 115 120 125 aaa cgg cgc ttt gca aga gtt cgt ttg ttt gaa gat cctgaa acc tta 495 Lys Arg Arg Phe Ala Arg Val Arg Leu Phe Glu Asp Pro GluThr Leu 130 135 140 ccc cca att tct gag tta ggc cca gat gct ctg ttt gaacca atg tcc 543 Pro Pro Ile Ser Glu Leu Gly Pro Asp Ala Leu Phe Glu ProMet Ser 145 150 155 gtc gat agt ttc ttg gac tcc ctg ggt aga aag aag ataggg ata aaa 591 Val Asp Ser Phe Leu Asp Ser Leu Gly Arg Lys Lys Ile GlyIle Lys 160 165 170 gct ctt cta ctt gat cag agc ttc ata tca ggc att ggcaat tgg att 639 Ala Leu Leu Leu Asp Gln Ser Phe Ile Ser Gly Ile Gly AsnTrp Ile 175 180 185 gca gac gag gtg ctt tac cag tca agg atc cat cca ttacag att gct 687 Ala Asp Glu Val Leu Tyr Gln Ser Arg Ile His Pro Leu GlnIle Ala 190 195 200 205 tcg aat cta cct agg gag agt tgt gaa gca ctg caccag agt atc gaa 735 Ser Asn Leu Pro Arg Glu Ser Cys Glu Ala Leu His GlnSer Ile Glu 210 215 220 gag gtt gtc aaa tat gct gtc gaa gtt gat gct gacatg gac cgc ttt 783 Glu Val Val Lys Tyr Ala Val Glu Val Asp Ala Asp MetAsp Arg Phe 225 230 235 ccg aag gaa tgg tta ttt cat cac cgt tgg ggc aagaag cct ggc aaa 831 Pro Lys Glu Trp Leu Phe His His Arg Trp Gly Lys LysPro Gly Lys 240 245 250 gtc gat gga aag aaa atc gag ttc ata aca gct ggtggc agg acc act 879 Val Asp Gly Lys Lys Ile Glu Phe Ile Thr Ala Gly GlyArg Thr Thr 255 260 265 gcc tac gtg ccg caa ctg caa aaa ctg gtt gga acccag tcc agc aaa 927 Ala Tyr Val Pro Gln Leu Gln Lys Leu Val Gly Thr GlnSer Ser Lys 270 275 280 285 acg ata tcc gtg gcc gag aac ggt gat gcc aaggat tca ggg acc gag 975 Thr Ile Ser Val Ala Glu Asn Gly Asp Ala Lys AspSer Gly Thr Glu 290 295 300 gga gaa gat gca gat gca gat gtt ttg aag ccaaga aag cga gcc gcg 1023 Gly Glu Asp Ala Asp Ala Asp Val Leu Lys Pro ArgLys Arg Ala Ala 305 310 315 acc tcc agg gga cag cga aac aaa gat acc gccggc tcg aga aaa gca 1071 Thr Ser Arg Gly Gln Arg Asn Lys Asp Thr Ala GlySer Arg Lys Ala 320 325 330 aga gga aat ggc gcc gat gct gag gcg gct gaacca gca aca ggt gtc 1119 Arg Gly Asn Gly Ala Asp Ala Glu Ala Ala Glu ProAla Thr Gly Val 335 340 345 gtc gga agc aac agt gag caa gct ttt ggc caagcc aac agt gac gct 1167 Val Gly Ser Asn Ser Glu Gln Ala Phe Gly Gln AlaAsn Ser Asp Ala 350 355 360 365 gtc gat aaa tca gat cgg gct aca aga cgatcg tcg agg aaa gtg aaa 1215 Val Asp Lys Ser Asp Arg Ala Thr Arg Arg SerSer Arg Lys Val Lys 370 375 380 gcc cgc aag taaatctgaa caaggtagccagggatctgt ccatggagtt 1264 Ala Arg Lys tcatactggc cagcgtattt gcgcctctgagtaatgtatc ttaggaacag aagattatat 1324 tcatgctgca tattcctggg ggattcgctccggaccaacg tttgctctgt tccctcggtg 1384 ctatggatag tagcatatct aggttgtgcataaatgcact gaggtttatg tactctttcc 1444 aatcttccat gatgctatgg aagaggtgattaggtgaaat gatgtttccc ctggcgcgtg 1504 cggttccacg catagttgcc gtaaagtgaaaaaaatacag attgcttaaa aaaaaaaaaa 1564 aaaaaaaaaa aaaaaaaaaa aa 1586 2384 PRT Zea mays 2 Met Pro Glu Leu Pro Glu Val Glu Ala Ala Arg Arg AlaLeu Gln Ala 1 5 10 15 His Cys Val Gly Arg Arg Ile Ala Arg Cys Ala ValAla Asp Asp Ala 20 25 30 Lys Val Val Val Ala Ala Ala Gly Arg Ala Ala PheGlu Arg Ala Met 35 40 45 Val Gly Arg Thr Ile Val Ala Ala Arg Arg Arg GlyLys Asn Leu Trp 50 55 60 Leu Gln Leu Asp Ala Pro Pro Phe Pro Ser Phe GlnPhe Gly Met Ala 65 70 75 80 Gly Ala Ile Tyr Ile Lys Gly Ile Pro Val ThrAsn Tyr Lys Arg Ser 85 90 95 Val Val Asn Ser Glu Glu Glu Trp Pro Ser LysHis Ser Lys Phe Phe 100 105 110 Ala Glu Leu Asp Asp Gly Leu Glu Phe SerPhe Thr Asp Lys Arg Arg 115 120 125 Phe Ala Arg Val Arg Leu Phe Glu AspPro Glu Thr Leu Pro Pro Ile 130 135 140 Ser Glu Leu Gly Pro Asp Ala LeuPhe Glu Pro Met Ser Val Asp Ser 145 150 155 160 Phe Leu Asp Ser Leu GlyArg Lys Lys Ile Gly Ile Lys Ala Leu Leu 165 170 175 Leu Asp Gln Ser PheIle Ser Gly Ile Gly Asn Trp Ile Ala Asp Glu 180 185 190 Val Leu Tyr GlnSer Arg Ile His Pro Leu Gln Ile Ala Ser Asn Leu 195 200 205 Pro Arg GluSer Cys Glu Ala Leu His Gln Ser Ile Glu Glu Val Val 210 215 220 Lys TyrAla Val Glu Val Asp Ala Asp Met Asp Arg Phe Pro Lys Glu 225 230 235 240Trp Leu Phe His His Arg Trp Gly Lys Lys Pro Gly Lys Val Asp Gly 245 250255 Lys Lys Ile Glu Phe Ile Thr Ala Gly Gly Arg Thr Thr Ala Tyr Val 260265 270 Pro Gln Leu Gln Lys Leu Val Gly Thr Gln Ser Ser Lys Thr Ile Ser275 280 285 Val Ala Glu Asn Gly Asp Ala Lys Asp Ser Gly Thr Glu Gly GluAsp 290 295 300 Ala Asp Ala Asp Val Leu Lys Pro Arg Lys Arg Ala Ala ThrSer Arg 305 310 315 320 Gly Gln Arg Asn Lys Asp Thr Ala Gly Ser Arg LysAla Arg Gly Asn 325 330 335 Gly Ala Asp Ala Glu Ala Ala Glu Pro Ala ThrGly Val Val Gly Ser 340 345 350 Asn Ser Glu Gln Ala Phe Gly Gln Ala AsnSer Asp Ala Val Asp Lys 355 360 365 Ser Asp Arg Ala Thr Arg Arg Ser SerArg Lys Val Lys Ala Arg Lys 370 375 380 3 36 DNA Artificial SequenceDesigned oligonucleotide based upon the adapter sequence and poly T toremove clones which have a poly A tail but no cDNA. 3 tcgacccacgcgtccgaaaa aaaaaaaaaa aaaaaa 36

What is claimed is
 1. An isolated protein comprising a full-lengthpolypeptide comprising at least 30 contiguous amino acids from thepolypeptide of SEQ ID NO: 2, wherein the polypeptide has 8-oxyguanineDNA glycosylase (OGG) activity.
 2. An isolated protein comprising anamino acid sequence having at least 75% sequence identity over theentire length of SEQ ID NO: 2; wherein the percent identity isdetermined by the GAP algorithm under default parameters, and whereinsaid sequence encodes a polypeptide having 8-oxyguanine DNA glycosylase(OGG) activity.
 3. The isolated protein of claim 2, wherein the aminoacid sequence has at least 80% sequence identity over the entire lengthof SEQ ID NO:
 2. 4. The isolated protein of claim 2, wherein the aminoacid sequence has at least 85% sequence identity over the entire lengthof SEQ ID NO:
 2. 5. The isolated protein of claim 2, wherein the aminoacid sequence has at least 90% sequence identity over the entire lengthof SEQ ID NO:
 2. 6. The isolated protein of claim 2, wherein the aminoacid sequence has at least 95% sequence identity over the entire lengthof SEQ ID NO:
 2. 7. An isolated protein comprising a polypeptide encodedby SEQ ID NO: 1, wherein the protein has 8-oxyguanine DNA glycosylase(OGG) activity.
 8. An isolated protein comprising the polypeptide of SEQID NO: 2, wherein the protein has 8-oxyguanine DNA glycosylase (OGG)activity.